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Abstraxit 

Concurrency  control  methods  are  studied  in  two  contexts;  centralized  and 
distributed  database  management  systems.  Models  to  study  the  performance  of 
the  many  algorithms  that  have  been  proposed  to  keep  databases  consistent  are 
presented. 

For  the  centralized  case  a  heuristic  analytical  model  is  solved  iteratively 
and  results  concerning  optimal  granularity  eu'e  derived.  A  proof  that  for  all  rea¬ 
sonable  systems  the  iteration  converges  is  given.  Additionally,  a  simple  test 
that  determines  whether  the  point  of  convergence  is  unique  is  provided. 

Previous  studies  of  the  performance  of  concurrency  control  mechanisms 
in  distributed  database  management  systems  have  not  considered  workload 
characteristics  or  system  constraints.  These  factors  are  incorporated  into  a 
framework  that  enables  an  easier  choice  of  concurrency  control  algorithm  for  a 
distributed  database  management  system.  Finally,  two  distributed  algorithms 
are  compared  using  a  simulation  model  and  the  effect  of  various  system  and 
workload  parameters  are  investigated. 
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Chapter  One 

Modelling  of  Concurrency  Control  Mechanisms 


1.1.  Introduction 

In  any  multi-user  Database  Management  System  (DBMS),  it  is  necessary  to 
provide  concurrency  control  to  coordinate  the  actions  of  users  who  want  to 
access  the  database  at  the  same  time.  There  is  an  implicit  assumption  that  the 
shared  database  satisfies  certain  consistency  constraints.  However,  it  is  not 
p)osslble  to  have  consistency  constraints  enforced  at  each  action  (i.e.,  each 
write).  For  example,  when  transferring  money  from  one  bank  account  to 
another,  there  will  be  an  instant  during  which  one  account  has  been  debited 
and  the  other  not  yet  credited.  This  violates  the  obvious  constraint  that  the 
total  number  of  dollars  remains  constant.  To  overcome  this,  the  actions  of  a 
process  are  grouped  into  a  transaction,  which  becomes  the  unit  of  consistency. 
Each  treinsaction  preserves  consistency  if  run  alone,  and  concurrency  control 
protocols  disallow  any  interleavings  of  transactions  that  would  not  guarantee 
consistency.  These  protocols,  which  are  called  concurrency  control  methods, 
ensure  that  constituent  actions  are  processed  in  an  acceptable  order  or  that 
the  transaction  is  aborted  if  inconsistencies  would  eu"ise.  The  problem  of  con¬ 
currency  control  is  magnified  for  three  reasons  in  a  distributed  database 
management  system  (DDBMS)  where  the  data  is  geographically  distributed 
among  different  sites  in  a  network,  (l)  Users  may  access  data  that  is  stored  at 
various  sites,  (2)  the  concurrency  control  mechanism  does  not  know  instan¬ 
taneously  about  actions  taking  place  at  remote  sites,  and  (3)  there  may  be  mul¬ 
tiple  copies  of  the  data  items. 

This  thesis  is  divided  into  two  major  sections,  both  of  which  deal  with  con¬ 
currency  control  methods.  The  first  section  concerns  centralized  database 
management  systems  where  all  the  data  is  stored  at  a  single  site  and  presents  a 
model  for  estimating  the  performance  of  a  particular  locking  algorithm.  The 
model  is  described  by  several  equations  of  the  form  w  =  f(w)  that  relate  the 
expected  waiting  times  at  different  points  in  the  system.  The  waiting  times 
used  were  chosen  so  that  when  summed,  they  include  the  entire  waiting  time  of 
the  jobs  but  include  no  part  twice.  The  equations  can  then  be  solved  using  suc¬ 
cessive  substitution  and  can  be  proven  to  converge  when  modelling  many  real 
systems.  A  simple  test  is  given  to  determine  whether  the  point  of  convergence 
is  unique.  Once  the  equations  are  solved  within  the  desired  tolerance,  the 
throughput  and  response  times  can  be  generated  from  the  expected  waiting 
times  using  simple  formulae. 

A  simulation  of  the  same  system  was  also  used  to  test  additioned  assump¬ 
tions  made  when  constructing  the  analytic  model.  The  centralized  database 
modelling  problem  considered  in  Chapters  2  and  3  is  really  a  special  case  of  the 
more  general  problem  of  modelling  simultaneous  resource  possession  using  an 
analytic  model.  Allowing  transactions  to  use  more  than  one  resource  simul¬ 
taneously  produces  models  that  are  difficult  to  solve  using  traditional  queueing 
network  solutions  because  in  general  these  systems  do  not  satisfy  local  balance 
and  therefore  are  not  amenable  to  efficient  exact  solution  using  convolution 
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[DeBu78],  or  mean  value  analysis  [ReLa76],  for  example. ♦  One  reason  that  these 
systems  do  not  satisfy  the  local  balance  assumptions  is  that  locks  are  held  for 
an  amount  of  time  that  depends  on  the  amount  of  time  spent  accessing  other 
resources  in  the  system,  for  example  disks.  This,  coupled  with  the  fact  that 
while  the  jobs  are  holding  locks,  other  jobs  requesting  the  locked  data  items 
may  not  proceed,  violates  the  locsd  balance  assumptions  forcing  the  use  of  glo¬ 
bal  balance  techniques  or  heuristic  approximate  techniques  such  as  the  one 
presented  in  this  thesis. 

The  second  major  section  of  this  thesis  deals  with  analyzing  the  perfor¬ 
mance  of  concurrency  control  mechanisms  in  a  distributed  DBMS.  A  framework 
that  captures  the  essence  of  a  large  number  of  concurrency  control  methods  is 
described,  and  many  of  the  major  concurrency  control  methods  proposed  to 
date  are  cast  in  this  framework.  The  complete  detailed  specification  is  given  in 
Appendix  B  and  an  outline  of  each  method  is  included  in  the  body  of  the  thesis. 
Once  the  algorithms  have  been  cast  in  the  framework,  they  can  be  compared 
more  easily  because  the  features  that  most  influence  performance  become  visi¬ 
ble,  e.g.,  the  number  of  messages  being  transmitted,  the  points  at  which  a  tran¬ 
saction  may  be  blocked  or  restarted.  Certedn  tradeoffs  become  apparent  and 
situations  where  each  algorithm  performs  well  or  poorly  can  be  specified,  ena¬ 
bling  a  system  designer  to  make  a  choice  among  algorithms.  Examples  are 
given  showing  how  to  use  the  proposed  technique  for  choosing  a  concurrency 
control  mechanism  in  a  distributed  database  management  system. 

Previous  work  in  this  area  has  not  considered  workload  characteristics  or 
unique  system  constraints  in  the  comparison  of  concurrency  control  edgo- 
rithms.  TTig  choice  of  algorithm  has  been  made  without  regard  to  the  particu¬ 
lar  DDBMS  under  study.  This  thesis  takes  into  account  as  many  as  possible  of 
these  characteristics  and  constraints  and  uses  the  description  of  the  system  to 
help  choose  a  concurrency  control  mechanism. 

Two  distributed  concurrency  control  algorithms  that  use  very  different 
methods  for  synchronizing  transactions  are  simulated  and  the  results  eire  tabu¬ 
lated.  Various  parameter  settings  are  investigated  and  edgorithm  performeince 
is  evaluated.  Finally,  conclusions  and  directions  for  future  research  in  the  area 
of  distributed  database  performance  prediction  are  noted. 


•Local  balance  systems  are  of  a  special  form  described  in  detail  in  [BCMP75].  They  are  spe¬ 
cial  because  they  are  easily  solved  using  fast,  efficient  solution  techniques  called  product- 
form  solution  teclxniques  [ChSaVB]. 
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Chapter  T«ro 

Concurrency  Control  in  Centralized  Database 
Management  Systems 


2.1.  Introduction 

Anal3rtic  performance  evaluation  has  traditionally  been  concerned  pri¬ 
marily  with  the  modelling  of  computer  systems  that  lack  a  data  management 
component.  Such  systems  typically  involve  complex  manipulations  of  simple 
data.  An  information  management  system,  which  makes  simple  manipulations 
of  complex  data,  appears  to  be  much  more  difficult  to  model  analytically.  Cer¬ 
tain  aspects  of  these  information  management  systems  are  not  found  in  tradi¬ 
tional  computer  systems.  Examples  of  the  differences  are  choice  of  data  model, 
and  file  structure.  The  impact  on  performance  of  these  design  decisions  is  not 
currently  well  understood.  To  be  able  to  model  DBMSs  accurately,  these  design 
alternatives  will  have  to  be  modelled  explicitly. 

Systems  without  a  data  management  component  do  not  provide  elaborate 
facilities  for  sharing  data  since  users  typically  do  not  require  access  to  files 
other  than  their  own.  DBMSs  are  specifically  intended  to  make  it  possible  for 
users  to  access  databases  concurrently.  To  improve  performance  this  con¬ 
current  database  access  may  be  desirable.  Concurrency  control  mechanisms 
ensure  that  the  constituent  actions  of  transactions  are  processed  in  their 
correct  order  and  that  the  interleaving  of  transactions  occurs  in  an  acceptable 
way. 

The  number  of  possible  interleavings  is  large.  For  example,  if,  in  some 
interval,  transactions  Ti,  T^,  ■  ■  ■  ,  T/t  make  dj,  dg,  •  •  •  .  d*.  requests  (reads  and 
writes)  respectively,  then  the  number  of  ways  that  these  requests  can  be  inter¬ 
leaved  is  given  by  j^(di,  dg,  ■  •  ,  d*)  which  can  be  written  as: 

,  1  if  dj.dg,  '  •  ■  ,dj-i,dj+i,  •  •  •  ,1:4  =  0  and  dj  ^  0 

0  if  any  of  d^.dg,  •  ■  •  ,djb  <  0 


^  ^  othaTwise 


;jf(di,  dg,  ■  •  •  ,  d^)  assumes,  of  course,  that  as  soon  as  any  transaction’s 
request  is  finished,  the  next  request  is  chosen  with  equal  probability  from  the 
transactions  that  are  not  completed.  If  di=dg=  •  •  ■  =dj.=d,  then  a  nice  closed 
form  expression  can  be  written: 


Table  2.1  shows  the  exponential  growth  with  which  concurrency  control 
mechanisms  must  cope  to  ensure  that  only  legitimate  interleavings  eire  allowed 
to  occur.  The  number  of  legitimate  interleavings  depends  on  the  particular 
concurrency  control  mechanism  being  used  and  the  read  and  write  sets  of  the 
transactions.  The  quantity  is  difficult  to  estimate.  Consider  an  example  of  a 
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DBMS  where  each  of  three  transactions  make  one  read  and  one  write.  Then 
k  =  3  and  di  =  =  2  and  the  number  of  interleavings  iS  90.  If  there  are 

four  transactions,  k  =  4.  di  =  cfg  =  dg  =  =  2.  and  # (2, 2.2.2)  =  2520.  See  Table 
2.1  for  additional  values. 


k 

d 

3 

2 

90 

4 

2 

2520 

5 

2 

113400 

3 

3 

1660 

3 

4 

34650 

Table  2.1 

Many  concurrency  control  algorithms  use  locking  to  ensure  a  correct 
order.  M  order  is  correct  if  it  is  serializable.  Bernstein  and  Goodman  discuss 
how  to  determine  whether  an  order  is  serializable  [BeGo80].  Many  locking  algo¬ 
rithms  for  centralized  databases  have  been  described  in  the  literature  [Lom77, 
G1P75,  CBT74].  There  is  a  cost  associated  with  obtaining  concurrency,  and  that 
is  the  overhead  generated  by  these  locking  algorithms.  Recent  work  has  been 
done  to  try  to  characterize  the  tradeoff  between  locking  algorithm  overhead 
and  the  level  of  concurrency  [RiSt77.  RiSt79.  Rie79,  IrLi78,  PoLe80]. 

The  granularity  of  a  database  is  characterized  by  the  size  of  its  lockable 
objects,  among  other  things.  Fine  granularity  means  the  database  is  broken  up 
into  many  small  lockable  objects  or  granules.  Coarse  granularity  means  the 
database  is  broken  up  into  fewer  large  granules.  If  there  are  a  large  number  of 
granules  then  there  is  potentially  a  large  locking  overhead,  but  more  processes 
are  able  to  run  concurrently  without  blocking  each  other.  If  the  granularity  is 
coarse,  then  the  possibility  for  a  high  degree  of  concurrency  is  lowered 
because  the  processes  will  be  in  competition  for  control  of  the  same  granules 
more  frequently. 

fine  granularity  is  only  appropriate  for  simple  transactions  that  access 
only  a  few  records,  because,  if  a  transaction  accesses  many  records  simultane¬ 
ously,  there  can  be  many  locks  required.  Each  access  incurs  the  computational 
overhead  of  setting  and  perhaps  waiting  for  a  lock  and  the  storage  overhead  of 
representing  the  information  needed  for  a  lock.  Coarse  granularity  is  prefer¬ 
able  for  transactions  that  access  many  records  at  one  time. 

When  designing  a  new  computer  system  there  are  typically  many  design 
decisions  to  be  made.  Modelling  is  one  means  of  narrowing  the  range  of  possi¬ 
ble  choices  for  a  given  workload.  Modelling  is  cilso  useful  in  planning  to  meet 
anticipated  increases  in  an  installation  workload.  There  are  three  general 
classes  of  modelling  solution  techniques  in  use;  statistical,  ainalytlcal,  and 
simulation.  Statistical  models  are  constructed  by  fitting  equations  to  measure¬ 
ment  data.  They  are  not  as  useful  as  they  might  seem,  because  they  are 
difficult  to  use  for  prediction  if,  for  example,  the  workload  changes  or  a  new 
scheduling  discipline  is  put  into  use. 

One  form  of  analytic  model  is  created  by  forming  equations  that  capture 
the  system's  behavior.  Once  the  equations  have  been  formulated,  they  are 
solved,  yielding  performance  measures  such  as  throughput  (number  of  jobs 
processed  per  unit  time)  and  response  time  (time  units  per  job  processed).  The 
models  can  be  solved  quickly  in  many  ceises  and  have  produced  usefully  accu¬ 
rate  results  [Bar80].  One  drawback  of  analytic  models  is  that  they  usually  pro¬ 
duce  only  equilibrium  performeince  statistics. 
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Simulation  is  a  third  technique  for  obtaining  performance  measures  for 
computer  system  models.  Simulation  models  have  been  used  to  investigate 
whether  additional  assumptions  made  in  analytic  models  are  reasonable.  They 
have  the  advantage  that  accurate  results*  can  be  obtained  as  the  accuracy  is  a 
function  of  the  level  of  detail  placed  in  the  model.  Another  advantage  of  simu¬ 
lation  models  is  that  they  are  capable  of  producing  distributions  of  perfor¬ 
mance  measures,  while  many  widely  used  analytic  models  are  restricted  to 
averages  of  such  performance  measures  as  response  times.  On  the  negative 
side,  simulation  models  are  difficult  to  implement  and  validate,  and  can  con¬ 
sume  large  amounts  of  computer  time  to  obtain  accurate  solutions.  In  fact. 
Bard  and  Sauer  state  that  simulation  should  be  considered  only  when  a  suitable 
analytic  model  is  not  available  [BaSaBO]. 

Hybrid  solution  techniques  that  use  more  than  one  modelling  strategy 
have  been  used  to  minimize  the  disadvantages  of  the  constituent  strategies. 
For  example,  complicated  scheduling  policies  may  be  difficult  to  model  using 
an  analytic  model.  The  model  may  be  able  to  be  decomposed  into  two  levels 
where  an  analytic  model  can  be  used  at  the  lower  level  to  model  the  interac¬ 
tions  eimong  the  I/O  devices  eind  the  results  from  the  lower  level  model  could 
then  be  plugged  into  an  upper  level  simulation  model  that  models  the  compli¬ 
cated  scheduling  policy.  An  example  of  a  situation  where  it  is  desirable  to  use  a 
simulation  at  the  lower  level  and  an  analji^ic  model  at  the  higher  level  is  given 
later  in  this  thesis.  Another  example  of  a  hybrid  simulation-analytic  model  is 
given  in  [Sch78]. 

2.1.1,  Previoiis  Performance  Results 

Very  little  work  has  been  done  to  characterize  the  tradeoff  between  locking 
overhead  and  the  degree  of  allowable  parallelism.  Ries  and  Stonebraker  did  a 
detailed  simulation  study  [RiSt77,  RiSt79,  Rie79].  Analytic  models  that  made 
different  assumptions  about  the  operating  environment  but  were  far  less  expen¬ 
sive  to  run  were  described  by  Irani  and  Lin  [IrLi78]  and  later  by  Potier  and 
Leblanc  [PoLeBO],  This  section  reviews  these  efforts  and  others  in  order  to 
understand  the  work  described  in  the  remaining  sections  of  this  thesis. 

Ries  and  Stonebredcer  describe  a  detailed  study  of  the  locking  gremularity 
problem  for  centralized  databases  using  a  simulation  model.  For  the  model, 
they  assumed  a  batch  multiprogrammed  environment  with  a  fixed  multipro¬ 
gramming  level.  They  assumed  that  all  service  is  obtained  in  one  visit  to  both 
the  CPU  and  the  I/O  devices.**  The  service  acquired  at  the  CPU  is  scheduled  via 
a  process or-shcuing  discipline  so  that  if  there  are  n  jobs  at  the  CPU,  then  each 

job  obtains  — ^  of  the  processing  power.  Similarily,  the  I/O  scheduling  is  con- 

sidered  to  be  processor  sharing,  but  a  parameter  is  included  to  represent  hav¬ 
ing  multiple  paths  to  the  I/O  devices.  This  parameter,  referred  to  as  the  I/O 
overlap,  effectively  acts  as  a  speedup  factor  for  the  I/O  devices. 

An  assumption  is  made  in  the  initial  model  that  all  granules  are  of  equal 
size,  though  later  a  different  model  is  considered  where  a  transaction  either 
locks  the  whole  database  with  a  global  lock  or  locks  several  of  the  smaller 
granules.  The  locking  routines  are  assumed  to  have  a  higher  priority  than  reg¬ 
ular  processing  and  the  amount  of  service  required  by  a  transaction  is  propor¬ 
tional  to  the  number  of  entities  requested, 

•This  accxiracy  is  limited,  of  course,  to  the  accuracy  with  which  the  workload  has  been 
characterized. 

•♦The  same  assumption  can  be  made  in  loceil  balance  models  without  altering  the  results. 
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The  processing  model  assumes  that  if  a  transaction  gets  blocked  on  any  of 
its  lock  requests,  it  waits  in  a  blocked  queue  until  all  of  its  locks  are  released 
and  then  requests  the  locks  again.  The  locking  overhead  that  is  incurred  is 
then  twice  what  it  would  have  been,  had  the  first  request  been  successful.  The 
overhead  required  to  scan  the  blocked  queue  each  time  a  transaction  com¬ 
pletes  and  releases  its  granules  is  not  accounted  for  in  this  model.  The  deci¬ 
sion  of  whether  to  grant  or  deny  a  lock  request  is  accomplished  in  two  ways. 
The  model  either  assumes  (1)  that  a  subset  of  granules  is  requested  all  at  once 
or  (2)  that  the  granules  are  requested  one  at  a  time  and  that  the  requests  are 
uncorrelated  (i.e..  independent).  The  model  calculates  a  probability  of  the 
request  being  successful  based  on  one  of  these  two  request  strategy  assump¬ 
tions. 

One  difficulty  with  this  locking  algorithm  is  that  a  request  may  experience 
indefinite  postponement.  It  is  possible  for  a  transaction  to  be  continually 
blocked  and  to  remain  in  the  blocked  queue  forever.  Consider  the  following 
simple  example.  Suppose  that  there  are  two  granules  gi  and  yg.  and  transac¬ 
tion  Ti  has  locked  y  i  and  Tg  has  locked  yg.  Then  transaction  7g  requests  g  i  and 
yg.  Tg  is  denied  the  request  since  Ti  has  locked  gi  and  Tg  is  placed  in  the 
blocked  queue.  The  situation  is  then: 

Ti  has  lock  on  y  i 
Tz  has  lock  on  yg 

Tg  is  blocked  on  y  i  and  y  g 

Now,  suppose  that  a  series  of  transactions  arrives  so  that  there  is  always  a 
request  for  yj  alone  and  always  a  request  for  yg  alone  waiting  to  run.  When 
each  request  that  has  a  lock  on  g  i  finishes  and  releases  the  lock  on  g  i,  there 
will  still  be  a  lock  on  yg  Eind  vice  versa  so  that  Tg  can  never  proceed.  Since 
there  are  adways  jobs  ready  to  request  yj  and  yg  alone,  Tg  will  never  proceed 
and  hence  is  indefinitely  postponed.  From  the  system's  point  of  view, 
throughput  is  increased;  however,  from  the  user’s  point  of  view,  response  times 
may  be  arbitrarily  large.  This  is  a  consequence  of  the  locking  algorithm  being 
used,  however,  and  not  the  model.  Using  a  priority  scheme  could  alleviate  the 
problem  but  this  is  not  considered  by  the  authors. 

The  purpose  of  the  simulation  is  to  investigate  the  tradeoff  between  con¬ 
currency  control  algorithm  overhead  and  allowable  parallelism.  The  results  of 
the  study  identify  situations  in  which  fine  granularity  is  more  appropriate.  For 
large  transactions  that  touch  many  data  items,  fine  granularity  becomes  too 
expensive.  A  transaction  that  accesses  half  the  databeise  would  spend  a  lot  of 
time  locking  each  page.  At  the  seime  time,  little  pac*allelism  would  be  gained 
since  other  transactions  would  most  probably  require  access  to  data  held  by  the 
large  transaction  and  be  blocked.  On  the  other  hand,  if  granularity  is  coarse,  a 
small  transaction  that  accesses  only  a  small  part  of  the  database  must  lock  a 
much  larger  granule.  The  resultant  loss  in  parallelism  would  be  minimized 
because  the  small  transaction  would  only  hold  the  lock  for  a  short  amount  of 
time.  (Recall  the  assumption  that  the  amount  of  service  required  is  propor¬ 
tional  to  the  number  of  entities  requested.)  The  probability  of  conflict  and  the 
length  of  any  waiting  would  be  short  since  the  length  of  time  the  lock  is  held 
would  be  short 

The  following  conclusions  were  drawn  from  the  study.  A  small  number  of 
granules  is  appropriate  (coarse  granularity)  when  the  number  of  entities 
required  by  transactions  varies  in  size  and  some  transactions  require  many. 
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Fine  granularity  is  appropriate  when  either  all  transactions  are  small  and 
access  less  than  1%  of  the  database  or  the  length  of  time  locks  are  held  is 
extremely  long  and  not  proportioned  to  the  size  of  the  treinsaction.  (This  was  an 
extension  that  was  added  to  reflect  the  situation  in  which  locks  were  held  while 
a  user  was  prompted  for  information.)  Fine  granularity  may  also  be  appropriate 
if  the  access  patterns  are  random  with  no  sequentiality.  Ries  also  investigated 
hierarchical  locking  schemes  where  certain  locks  represented  a  collection  of 
other  locks  that  were  lower  in  the  hierarchy.  This  enabled  a  large  transaction 
to  request  the  big  lock  instead  of  having  to  make  many  requests  for  the  smaller 
locks.  In  situations  where  the  hierarchy  was  set  up  so  that  the  big  lock 
represented  those  locks  that  were  going  to  be  locked  anyway,  the  locking  over¬ 
head  could  be  diminished.  Ries  only  considered  a  two-level  hierarchy  where  the 
transaction  locked  either  the  entire  database  or  each  individual  lock.  These 
results  are  summarized  in  more  detail  in  [Rie79]. 

Simulation  is  traditionally  very  expensive.  In  fact,  Ries  stated  that  his 
simulations  were  requiring  "days"  of  computer  time  to  complete  [P2ie8l].  One 
of  the  advantages  of  analytic  models  is  that  they  are  generally  less  expensive  to 
run  than  simulation  models. 

Analytic  models  have  been  proposed  by  Irani  and  Lin  as  a  means  of  investi¬ 
gating  the  tradeoff  between  locldng  overhead  and  increased  concurrency 
[IrLi70].  The  major  difficulty  with  their  model  is  that  the  mean  time  until  lock 
release  for  a  blocked  request  should  depend  on  the  level  of  multiprogramming 
and  the  number  of  granules  in  the  database,  but  in  the  model  this  parameter  is 
measured  by  simulation  or  empirical  measurement  and  is  assumed  constant. 
Irani  and  Lin  consider  two  different  models,  one  where  the  lock  table  is  smeill 
enough  to  be  kept  in  core,  and  another  where  it  is  necessary  for  the  lock  table 
to  be  stored  on  disk  [lrLi78].  This  makes  a  difference  in  the  amount  of  locking 
overhead  attributed  to  each  request.  No  mention  was  made  in  the  paper  about 
where  the  list  of  transactions  that  are  queued  waiting  for  a  blocked  granule 
would  be  kept.  This  list  could  be  long  and  would  most  likely  be  stored  on  disk. 
The  overhead  for  having  to  access  this  list  for  queueing  and  dequeueing  was  not 
accounted  for  in  the  model.  Preliminary  results  in  [IrLi78]  agree  well  with 
those  in  [Rie79]. 

Potier  and  Leblanc  proposed  a  probabilistic  anal3d,ic  model  for  the  locking 
granularity  problem  [PoLeBO].  In  their  model,  transactions  switch  from  a 
BLOCKED  state  to  an  ACTIVE  state  and  then  back  to  the  terminal.  To  solve  the 
model,  the  probability  of  being  blocked  is  determined.  However,  the  maximum 
number  of  transactions  that  are  released  from  the  BLOCKED  queue  at  each 
departure  is  left  as  a  control  parameter.  This  parameter  is  a  difficult  quantity 
to  measure,  let  alone  guess,  eind  a  bad  estimate  could  lead  to  erroneous  results. 
Since  their  results  are  not  compared  with  simulation  results  or  with  a  real  sys¬ 
tem,  the  accuracy  of  the  model  is  not  known. 

Many  other  analytic  techniques  that  yield  approximate  solutions  for  queue¬ 
ing  network  models  have  been  proposed  but  have  not  yet  been  adapted  or 
applied  to  model  concurrency  control  algorithms.  Decomposition  and  iteration 
are  two  techniques  used  to  solve  these  models.  They  do  not  always  yield  exact 
results:  however,  as  the  case  study  by  Bard  [Bard78]  that  is  described  below 
points  out,  they  can  produce  usefully  accurate  results.  Chandy  and  Sauer 
describe  an  approximate  solution  technique  where  the  network  is  decomposed 
into  several  parts  and  each  part  is  replaced  by  a  single  composite  queue  that  is 
flow-equivalent  to  the  subnetwork,  making  sure  that  the  job-flow  (throughput) 
through  the  composite  queue  is  equal  to  that  through  the  subnetwork  [ChSa78]. 
This  can  be  done  repeatedly.  The  subnetwork  that  is  isolated  using 
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decomposition  must  then  be  solved  by  an  efficient  algorithm  for  solving;  subnet¬ 
works  that  arise  from  a  decomposition  of  the  original  network  [CHW75]. 

Smith  and  Browne  used  decomposition  emd  iteration  to  deal  with  blocking 
in  software  systems  [SmBrBO].  Their  work  seems  to  be  more  applicable  to 
studying  congestion  in  operating  systems  rather  than  the  blocking  problems  in 
database  locking  algorithms.  The  reason  for  this  is  that  the  number  of  locks 
that  jobs  are  contending  for  is  less  in  the  operatmg  system  environment  than 
in  a  database  system  with  even  moderately  fine  granularity  and  their  technique 
is  only  useful  if  there  are  few  contentious  resources.  Another  reason  is  that 
their  technique  does  not  really  consider  the  details  of  the  scheduling  algorithm 
at  all.  When  a  transaction  is  blocked  it  is  sent  to  a  service  center  in  the  model 
where  it  waits  for  an  amount  of  time  that  is  assumed  to  be  exponential  so  that 
the  system  can  be  more  easily  solved.  The  model  does  not  lend  any  intuition  as 
to  how  that  wait  would  be  affected  if,  for  example,  the  concurrency  control  algo¬ 
rithm  were  changed. 

Decomposition  techniques  and  others  were  used  by  Bard  in  an  early  case 
study.  Bard  developed  YM/370  Predictor,  a  tool  for  modelling  VM/370,  an 
interactive,  multiprogrammed,  virtual  machine  operating  system  [Bar78]. 
VM/370  Predictor  is  composed  of  a  data  gathering  mechanism,  a  data  reduction 
package,  and  an  analytic  model.  The  analytic  model  uses  an  approximate  solu¬ 
tion  technique  that  is  based  on  a  closed  queueing  network  model  with  several 
job  classes.  The  contention  for  memory  precludes  a  product  form  solution; 
however,  the  steady  state  solution  can  be  derived  and  the  model  can  be  solved 
approximately  using  iteration  and  decomposition.  Other  approximations  are 
used  in  solving  the  model.  The  results  obtained  are  compared  with  bench- 
marked  real  systems  and  are  good.  CPU  utilizations  are  within  5%  and  mean 
response  time  predictions  are  within  30%. 

Bard  proposed  a  technique  based  on  mean  value  analysis  [ReLa78]  to 
approximate  solutions  of  analytic  models  [Bar79].  This  technique  uses  a  delay 
equation  to  relate  the  total  average  delay  suffered  by  a  job  in  a  given  queue  to 
the  average  length  of  that  queue  for  a  network  with  one  fewer  customer.  In  this 
way  all  the  performance  measures  can  be  "built  up"  starting  from  considering 
the  network  with  one  customer.  These  techniques  have  been  used  for  non-local 
balance  networks  to  achieve  approximate,  yet  accurate  performance  predic¬ 
tions, 

Bard  found  that  by  making  the  assumption  that  the  properties  of  the  net¬ 
work  with  n  customers  did  not  differ  much  from  those  of  a  network  with  n-1 
customers,  a  fast  approximation  technique  could  be  derived  by  replacing  the 
functions  of  n-1  customers  in  the  mean  value  analysis  equations  with  func¬ 
tions  of  n  customers,  thus  yielding  equations  that  contedned  quantities  relating 
only  to  an  n  customer  network.  The  resulting  equations  could  be  solved  with  a 
simple  iterative  procedure.  The  technique  is  hypothesized  to  work  well  for  cus¬ 
tomer  populations  larger  than  six.  Schweitzer  improved  the  technique  with  a 
slightly  more  detedled  approximation  of  the  n  — 1  customer  function.*  The 
heuristic  analytic  technique  discussed  later  in  Chapter  2  and  in  Chapter  3  was 
inspired  by  Bard's  technique. 

The  rest  of  this  chapter  and  Chapter  3  describes  two  models  that  avoid 
Irani  and  Lin's  assumption  that  the  mean  time  until  lock  release  is  a  constant 
independent  of  the  number  of  customers  and  the  number  of  granules.  They 
differ  from  Ries  and  Stonebraker’s  work  because  they  are  not  simulation 


*2ahorjan  discusses  a  further  improvement  for  small  numbers  of  cxistomers  when  there  is 
more  than  one  class  [ZahBO,  pp  145-150]. 
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studies.  They  use  a  heuristic  analytical  method  of  solution,  A  set  of  equations 
is  derived  to  describe  the  steady  state  of  the  system.  These  are  then  solved  to 
produce  expected  values  for  performance  measures  such  as  7?me  in  System  and 
ThrcrnghpiLt.  The  technique  uses  a  combination  of  techniques  described  else¬ 
where  [CHW75,  ChSa78.  Bar79,  BBC77]  to  solve  the  locking  granularity  problem. 

Section  2.2  presents  a  description  of  the  algorithms  and  scheduling  poli¬ 
cies  used  in  the  model.  Section  2,3  gives  detailed  derivations  of  the  essential 
equations  that  describe  the  original  model.  Section  2.4  presents  the  condi¬ 
tions  that  guarantee  convergence  of  the  iterative  solution  technique.  The 
iteration  is  shown  to  have  a  bounded  number  of  fixed  points  euid  an  algorithm  is 
jxi’esented  that  can  be  used  to  determine  the  uniqueness  of  a  point  found  by  the 
technique.  Sections  3.1  and  3.2  present  results  from  an  experiment  designed  to 
test  the  validity  of  this  new  analytic  technique.  Section  3.3  presents  conclu¬ 
sions  concerning  the  original  model.  Section  3.4  shows  how  to  bound  the  per¬ 
formance  of  a  model  that  eliminates  a  basic  assumption,  namely  that  the 
choice  of  any  particular  gremule  is  equiprobable.  In  Section  3.5  an  extension  of 
the  model  is  given  that  uses  an  alternate  job  scheduling  discipline. 


2.2.  Description  of  the  Model 

The  model  is  shown  pictorially  in  Figure  2.1.  The  model  is  a  single-class 
model  so  only  one  type  of  transaction  or  job  can  be  modelled  at  a  time.  The 
system  has  N  terminals  (hence  N  jobs),  g  granules,  a  queue  for  memory  (called 
the  MEM_queue),  and  a  Central  Subsystem  (CS).  As  many  as  ML  (multiprogram¬ 
ming  limit)  jobs  can  be  served  in  parallel  in  the  central  subsystem.  Service  at 
the  terminals  is  No  Queueing  and  Processor  Sharing  in  the  CS.  A  service  center 
that  has  a  No  Queueing  scheduling  discipline  can  be  interpreted  as  a  service 
center  with  enough  servers  so  that  no  one  waits  for  service.  Service  centers 
with  the  Processor  Sharing  discipline  use  a  round-robin  scheduler  where  the 
time-slice  awarded  is  infinitesimally  small.  Thus,  if  th^re  are  n  customers  at  a 

Processor  Sharing  service  center,  each  job  obtedns  ^  of  the  total  processing 

capacity  of  the  server.  The  granule  queues  and  the  MEM_queue  are 
First —Come —First— Served  (FCFS). 


The  activity  of  jobs  in  this  model  proceeds  as  follows.  Suppose  a  job  is  at  a 
terminal.  At  the  end  of  Its  think  time  (assumed  exponentially  distributed  with 
mean  Z)  the  job  chooses  a  set  of  y  granules.  The  choice  of  any  particular 
granule  is  assumed  equiprobable;  however,  in  section  3.4,  this  limitation  will  be 
eased.  If  all  the  requested  granules  are  free,  i.e.,  not  claimed  by  another  job, 
then  the  job  locks  each  granule;  otherwise  the  job  queues  for  the  required 
granules.  When  a  job  gets  to  the  head  of  the  queue  for  all  the  required 
granules,  it  becomes  ''available  to  run"  and  it  proceeds  to  the  MEM_queue. 
While  a  job  is  "available  to  run"  it  effectively  has  set  up  a  blockade  at  its 
required  gramule  queues  since  no  other  jobs  can  obtain  these  locks  until  the 
"available  to  run"  job  completes  service  and  releases  the  granules.  If  there  are 
fewer  than  ML  jobs  in  the  CS,  then  the  job  spends  no  time  in  the  MEM_queue  at 
all,  and  proceeds  to  the  CS  queue.  If  there  are  ML  jobs  in  the  CS  when  the  job 
arrives  in  the  MEM_queue,  then  the  job  queues  until  there  are  fewer  than  ML 
jobs  in  the  CS  and  the  job  is  at  the  head  of  the  line  at  the  MEM_queue,  at  which 
point  the  job  enters  the  CS.  Note  that  the  number  of  jobs  that  can  be  served 
simultaneously  in  the  CS  is  limited  by  the  smeiller  of  ML  (  the  multiprogram- 

a. 


ming  limit)  and 


(the  largest  number  of  jobs  that  can  simultaneously  hold 


all  their  required  granules).  When  a  job  enters  the  MEMuqueue,  it  has  in  effect 
set  up  a  blockade  behind  itself  so  that  no  one  else  may  use  the  granules  it 
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•  release  resource 
-  request  resource 


Rgure  2  I 


A  queueing  nel'^ork  portrayal  of  the  model. 
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requires  until  it  has  finished  in  the  CS  (and  released  the  granules). 

Once  the  job  enters  the  CS  it  is  serviced  at  a  rate  that  is  determined  by  the 
number  of  jobs  then  in  the  CS.  It  is  assumed  that  the  last  bit  of  processing  in 
the  CS  includes  releasing  the  locks  and  that  the  job  then  returns  to  the  termi¬ 
nal  and  begins  the  cycle  again. 

To  solve  this  model  for  a  typical  computer  system,  a  hierarchical  decompo¬ 
sition  technique  could  be  used  in  order  to  determine  the  load-dependent  inter¬ 
departure  times,  S[i],  for  the  CS.  These  inter- departure  times  provide  a  flexible 
method  of  modelling  the  competition  for  the  remednder  of  system  resources, 
e.g.,  I/O  channels,  disks.  They  should  include  the  overhead  due  to  contention 
from  system  routines  doing  the  locking.  The  S[l]  values  give  an  indication  of 
the  degree  of  parallelism  that  is  allowed  in  the  CS.  For  example,  if  S[l]  =  1. 
S[2]  =  .5,  and  S[3]  =  .33,  then  this  indicates  an  extremely  high  degree  of  allow¬ 
able  parallelism.  In  fact,  in  this  case  jobs  are  not  affected  by  the  presence  of 
other  jobs. 

The  other  extreme  would  occur  when  S[i]  =  1  for  all  I,  This  function 
represents  a  system  where  there  is  no  parallelism  allowed  at  all.  An  intermedi¬ 
ate  amount  of  congestion  would  be  found  in  a  system  whose  inter-departure 
times  were  found  to  be  S[l]  =  1,  S[2]  =  .8,  and  S[3]  =  .6.  A  wide  range  of  system 
behaviors  can  be  captured  with  this  service  function.  For  the  model,  these 
inter-departure  times  will  be  assumed  to  be  exponentially  distributed.  The 
parameters  for  the  model  are  given  in  Table  2.2. 

A  job  taices  up  residence  in  four  places  in  this  model:  at  the  terminal,  at 
the  granules,  at  the  MEM_queue,  and  in  the  CS.  The  expected  residence  times 
for  a  job  on  arrival  in  each  of  these  four  places  are  given  the  names  wterm>  > 
^HEU>  "^cs  respectively.  Note  that  when  summed,  these  residence  times 
include  all  the  waiting  time  and  processing  time  for  a  job.  In  addition  the 
residence  times  are  non-overlapping.  The  cycle  time  for  a  job  is  the  sum  of 
these  four  residence  times.  All  interesting  performance  measures  can  be 
expressed  in  terms  of  the  model  parameters  and  these  residence  times.  Figure 
2.2  shows  a  summary  of  a  day  in  the  life  of  a  typical  job  in  this  model. 


paramnter 

interpretal.on 

/V 

number  of  jobs 

number  of  granules 

7 

number  of  greuiules  required  by  each  job 

ML 

multiprogramming  limit  i 

Z 

mean  think  time  at  terminals 

S[i] 

load-dependent  inter-departure  times  for  the  CS 

Table  2.2 


Input.  p>aramet.er3  and  their  interpretation  for  the  model. 


DO  all  day; 

Think  at.  terminal  for  an  average  time  of  Z, 

"^TERM 

CbDcae  y  granulea.  every  subset  of  size  y  having  equal  probability; 

Gel  In  queue  at  each  chosexi  granule, 

w,, 

y 

Watt  'until  lock  is  obtained  at  each  required  granule, 

Proceed  to  the  MElLqueue; 

Wait  until  you  are  the  head  of  the  line  and 
the  number  in  CS  <  ML, 

proceed  to  CS  and  get  processed  at  a  rate  which 
depeiida  on  number  then  in  CS; 

Wes 

Releajw  lock  on  granules. 

SKDJDO; 

Figure  2,2 


A  L>T>icdl  day  in  Ihe  life  of  a  job  in  this  model 
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2.3.  Derivation  of  the  Equations 

To  solve  this  model,  a  number  of  equations  have  been  formed  that  can  be 
solved  iteratively  for  -Wg,  njcs,  and  Trivially.  ViiTSiU<  the  expected 

time  to  wait  when  a  job  arrives  at  the  terminal,  is  Z  since  there  are  always 
enough  terminals,  and  thus  no  queueing.  The  other  equations  will  turn  out  to 
be  of  the  form: 


-  S  a  >  '^cs) 

um  -  f  (l) 

“^'cs  =  /  "^cs) 


Successive  substitution  will  be  used  to  solve  the  system  of  three  non-linear 
equations  that  results.  The  convergence  criterion  will  be  that  the  iteration 
should  terminate  when  w-,  and  differ  from  w'g,  andin'cs-  by  a 

suitably  small  amount,  (functions  f  i,  f  2.  and  also  involve  the  other  model 
parameters,  namely  N,  g  ,j,  ML,  5[l],  •  •  ,S[ML],  and  Z,  but  these  do  not 
change  during  the  iterative  solution.) 

In  the  next  section,  it  will  be  shown  how  these  equations  can  actually  be 
written  as  one  function  of  one  variable  and  subsequently  several  results  con¬ 
cerning  the  convergence  properties  of  (l)  will  be  obtained.  To  understand  how 
the  equations  model  the  real  system  and  to  understand  their  derivation,  it  is 
simpler  to  view  them  as  three  functions  of  three  variables. 

All  three  important  equations  will  be  stated  in  such  a  way  that  the  system 
under  consideration  appears  to  have  A/— 1  customers.  This  is  because  the 
required  residence  times  are  those  expected  for  a  job  on  its  arrival  to  the 
queue.  TYhen  a  job  arrives  at  a  queue,  it  can  find  at  most  A/-1  customers  ahead 
of  it,  In  local  balance  networl^,  the  distribution  of  system  state  at  arrival 
instants  is  known  to  be  precisely  the  equilibrium  distribution  with  the  arriving 
customer  removed  [SeMiSl],  and  this  is  conjectured  to  be  a  reasonable  approxi¬ 
mation  for  non-local  balance  networks. 

Next,  a  few  equations  to  be  used  later  will  be  established.  The  first  equa¬ 
tion  is  for  Occ{ti),  the  probability  that  there  are  n  jobs  already  "available  to 
run"  when  a  job  arrives  at  the  MEM_queue  or  the  CS  queue.  To  be  "available  to 
run",  a  job  must  be  at  the  head  of  the  queue  at  each  of  its  requested  granules. 
It  will  be  shown  that; 


Occ  (n )  = 


/  Ki  /  \  if  0  <  n  <  minlN-i, 

♦sc(n;m)  \ 

m^n 


(2) 


0 


otherwise 


where  is  the  probability  that  there  are  m  jobs  not  at  the  terminals 

when  there  are  N-1  jobs  overall.  p(m:7V-l)  will  be  derived  below.  sc(n;7n)  is 
the  probability  that  n  jobs  are  "available  to  run"  given  that  the  scheduler  has 
already  considered  m  jobs. 
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To  motivate  equation  (2).  note  that  Cfcc  (n)  is; 


Occ(n)  =  ^ 

Probability  that 

Probabihty  that  there  are  exactly  n 

there  are  m  jobs 

jobs  available  to  rim  given  that  there 

m=n 

,  not  at  the  terminals  , 

,  are  m  jobs  not  at  the  terminals  . 

1)  is  derived  next.  This  is  equal  to  the  probability  that  there  are 
yV— 1-m  jobs  at  the  terminals  when  there  eire  N—1  jobs  overall. 

The  proportion  of  time  spent  at  the  terminals  by  jobs  is: 


"^TERU 


So,  p(r7i:yv^—l)  is  approximated  by: 


► 

1  - 

WtERM 

m 

*  1 

'^TERM 

[  m  J 

1  “ 

N-l-m 


The  derivation  of  sc  [n  ;m)  is  as  follows: 

sc(n;m)  =  ^  ^ 

*=o 

where  p^fin.x.Tn)  is  the  probability  that  n  jobs  are  "available  to  run"  with  x 
granules  still  free  given  that  m  jobs  have  been  considered  by  the  scheduler*. 


/ 


1 


if  rrt  =  n  =  0,x  =5—7 


Pgf(7i\x.ni)  = 


0 

Pxfi'n-Ux+y.nL-l)  *pg{Q,x+y) 

•gr.,  (n;x+i,m-l)  * p^(7-i;x -»-i) 


if  n  <0  or  m  <  0  or 
if  n  >{g  —y)  /  7  or  if  gf  — (n  *y)<x 

m  -  1,2 . N-\ 


pjg{i,x)  is  the  probability  that  exactly  i  requests  are  blocked  when  a  job 
arrives  and  requests  7  granules,  given  that  there  are  x  granules  free  when  the 
request  is  made.  pQ{i\x)  will  be  derived  later. 

To  justify  the  equation  for  pgf{Ti,x,m),  suppose  the  scheduler  has  con¬ 
sidered  m-1  jobs  and  is  now  considering  the  job.  There  are  two  ways  the 
scheduler  can  have  n  jobs  "available  to  run"  with  x  granules  still  free,  (l)  There 
are  n  —  1  jobs  "available  to  run"  with  x+7  granules  free  after  considering  m—1 
jobs  and  the  job  does  not  get  blocked  on  any  of  its  7  granule  requests,  or 
(2)  there  are  n  jobs  "available  to  run"  with  x+i  granules  free  after  considering 
m-1  jobs  and  the  m^  job  gets  blocked  on  y-i  granules  of  its  7  granule 
requests.  (If  a  job  gets  blocked  on  7-^  of  its  requests,  then  it  does  not  get 


•This  technique  of  forming  recursive  equations  is  based  on  a  similar  technique  used  by 
Brown,  Browne,  eind  Chandy  for  modelling  memory  management  systems  [BBC77]. 
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blocked  on  i  requests  which  means  that  i  granules  that  used  to  be  free  no 
longer  are.)  Case  (l)  corresponds  to  the  first  term  on  the  right  hand  side  equa¬ 
tion  for  pgj  {rL  ]x  ,m)  and  case  (2)  corresponds  to  the  second  term.  To  obtain 
5c(Ti;m),  pgj{n‘,x,m)  is  simply  summed  over  all  possibilities  of  x,  the  number 
of  granules  that  are  free.  That  completes  the  description  of  Cbc(n). 

will  be  derived  nejct.  Remember  that  wcs  is  the  expected  residence 
time  in  the  CS  that  a  job  can  expect  when  it  arrives  at  the  CS  queue  and  is: 


Occ{n)  ♦  (ti  +  I)  ♦5[n  + 


(3) 


+ 


1 


UL-\ 

£  Occ(n) 
n=0 


*ML  *  S[ML] 


where  5[z]  is  the  load-dependent  inter-departure  time  for  the  CS  when 
there  are  x  jobs  present  in  the  CS.  The  expected  time  it  teikes  for  a  particular 
job  to  complete  in  the  CS  (assuming  that  the  multiprogramming  level  remains 
at  n  +  1)  is  (n  +  1)  ♦  5[n  +  l].  Recall  from  above  that  Cfcc(n)  is  the  probability 
that  there  are  n  jobs  "available  to  run".  If  n  <  ML,  Occ(n)  gives  the  probability 
that  the  number  of  jobs  in  the  CS  is  n.  Since  wcs  is  the  residence  time  expected 
when  a  job  arrives,  if  ML  or  more  jobs  are  "available  to  run"  then  at  least  one 
job  is  in  the  MEM_queue. 

Note  that 


is  the  probability  that  at  least  ML  jobs  are  "available  to  run".  If  there  are  ML 
or  more  jobs  "available  to  run",  then  the  arriving  job  must  have  arrived  from 
the  MEM-iiueue  and  will  require  ML  *  5[A/L]  time  units  to  complete  its  service 
in  the  CS  since  the  CS  is  saturated  with  ML  jobs.  The  above  equation  for  'Wqs, 
(3)  is  the  first  important  equation  needed  to  solve  this  model. 


The  equation  for  the  residence  time  that  a  job  can  expect  when  it  ar¬ 

rives  at  the  MEM_queue  is  as  follows:* 

'  ^  -I) 

(n-A/L  +  l)  *  Occ{n)  *5[M,]  (4) 


imn 


N-\, 


"^MEM  - 


£ 

n=UL 


g  -7| 


This  equation  can  be  explained  by  noting  that  if  there  is  anyone  in  queue 
at  the  MEM_queue  then  they  will  proceed  at  a  rate  of  1  /  S[ML],  i.e.,  each  job  in 
queue  ahead  of  the  arriving  job  will  require  S[ML^  time  units  to  advance  one 
position  in  the  FVFS  queue.  This  is  because  S[ML]  is  the  time  between  depar¬ 
tures  in  the  CS  when  there  are  ML  jobs  in  the  CS,  and  when  anyone  is  in  the 


•Note  that  when  ML  is  equal  to  jV  so  that  there  are  enough  memory  partitions  for  all  the 
jobs  to  run,  VJuffu  evaluates  to  zero  as  it  should  since  no  jobs  will  queue  for  memory. 
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MEM_queue  there  are  ML  people  in  the  CS.  If  there  are  n  jobs  "available  to  run" 
and  n  is  greater  than  ML,  then  there  are  n—ML  jobs  in  the  MEM_queue.  Since 
interdeparture  times  in  the  CS  are  assumed  to  be  exponentially  distributed,  the 
expected  residual  life  of  the  job  in  service  at  the  time  of  arrival  to  the 
saturated  CS  is  S[ML^,  and  equation  (4)  follows.  Equation  (4)  is  the  second 
important  equation. 

The  third  and  final  equation  is  for  Wg,  the  expected  residence  time  in  the 
granule  queue  before  a  job  can  proceed  to  the  MEM_queue.  First,  is 

derived,  which  is  the  mean  wait  incurred  while  the  job  is  waiting  for  all  its 
granule  requests  to  be  granted,  given  that  the  arriving  job  is  blocked  on  i  out  of 
its  7  granule  requests.  The  job  is  therefore  proceeding  in  "parallel"  in  each  of  i 
granule  queues. 

The  eissumption  that  the  wait  until  a  particular  granule  Is  locked  is  deter¬ 
ministic  will  be  made.  The  case  where  the  wait  was  assumed  to  be  exponential 
was  also  investigated,  but  the  performance  predictions  arising  from  the  model 
appeared  pessimistic.  The  deterministic  assumption  is  intuitively  optimistic. 
This  is  because  in  the  deterministic  case  the  wait  at  each  granule  is  the  same 
and  the  wait  does  not  depend  on  the  number  of  granules  at  which  the  job  is 
blocked  by  other  jobs.  Therefore,  Tnw{i)  =  X  and  it  appears  that  all  granules 
are  obtained  after  X  time  units.  If  the  wait  was  anything  but  deterministic  then 
there  would  be  a  higher  variance  in  the  waiting  times  and  this  would  lead  to  a 
more  pessimistic  performance  prediction.  Since  better  results  were  obtained 
with  the  assumption  that  the  wait  is  deterministic,  from  this  point  on  in  this 
chapter,  it  will  be  assumed  that  mia(i)  =  X. 

The  proper  choice  of  X  should  be  the  average  waiting  time  in  queue  at  a 
granule.  If  Tg  is  the  average  time  required  to  move  up  one  position  in  queue, 
and  r  is  the  expected  residual  life  divided  by  the  expected  lifetime  of  the  job 
that  has  the  granule  locked,  then 


(5) 


X  —  ^  Pgran  1  *  Tg 


where Pgrvni^)  is  the  probability  that  there  are  k  jobs  in  queue  at  a  partic¬ 
ular  granule  when  the  job  arrives: 


Pffnin(A:)  =  ^  p{rrL,N-i)  *Pg{k:Tn) 


m^k 


where  pg{k  ,171)  is  the  probability  of  finding  k  requests  in  queue  at  a  partic¬ 
ular  granule  on  arrival  given  that  there  are  m  jobs  that  are  not  at  the  termi¬ 
nals.  The  probability  that  a  particuleu"  granule  is  chosen  when  a  job  makes  its 

requests  is  ^  and  p^(A:  ;m)  is  a  simple  binomial  expression  for  the  fact  that  k 

jobs  chose  the  specified  granule  and  the  m-k  other  jobs  did  not. 


Pg{k\m.)  = 


Sm—k 
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The  average  Lime  between  completions  in  the  CS  is  given  by: 


Tcs  = 


'^TERM  +  +  '^UEU 

N 


which  is  the  reciprocal  of  the  throughput  of  the  system  with  N  jobs  present. 

Whenever  a  job  completes  at  the  CS  it  releases  all  of  its  j  granules  so  a  peir- 
ticular  granule  is  released  with  probability  y/  g .  Tg,  the  average  time  required 
to  move  up  one  position  in  queue  is  Tqs  expanded  by  the  reciprocal  of  y/ g  or: 


The  residual  life  divided  by  the  expected  total  service  time  of  the  job  in  service 
when  the  arriving  job  enters  the  FCFS  granule  queue  is  cedled  r.  So,  when  a  job 
arrives  at  a  granule  queue  and  finds  k  jobs  ahead  of  it,  it  can  expect  to  wait  for 
the  rest  of  the  job  at  the  head  of  the  queue  which  is  in  service  (r),  plus  each  of 
the  other  fc-1  jobs  ahead  of  it  to  finish.  Each  of  these  k-1  jobs  completes  once 
every  Tg  time  units  on  average  The  procedure  used  for  obtaining  values  for  r 
is  described  in  section  3.1.  That  completes  the  description  of  Tnw{i). 

There  is  one  more  equation  that  is  needed  to  describe  Ug.  It  is  for 
fi-obB{i:m),  the  probability  that  exactly  i  requests  are  blocked  when  a  job 
arrives  and  requests  y  granules,  given  that  there  are  m  jobs  in  the  system  that 
have  already  made  their  requests  ahead  of  the  eirriving  job. 


ProbB{i,7n)  = 


0 


if  i=0,m  =  0 


if  i<0  or  m<0  or  i>y 


J  Pf{x:m)  *pB{i:x)  rn  -  1 . N  1 

z-max{g  -(m*7),0) 


where  pf{x,Tn)  is  the  probability 
jobs  have  already  made  their  requests. 


that  X  granules  eire  free  given  that  m 
If  z  =  g  ,  m  =0 


1 


Pf{x\m) 


0 

^p^{x-hi.7n-l)  *  pffiy-i.x+i) 
i=0 


if  X  9^  g  ,  m  =  0 

or  X  <0  or  m  <0  or  x  >  g 

m.  =  1,  .  .  .  .  N-1 


andj)f^(i,z)  is  the  probability  that  exactly  i  requests  are  blocked  when  a  job  ar¬ 
rives  and  requests  y  granules  given  that  x  are  free.  It  will  be  derived  below. 

The  justification  for  jd/  {x  -.m)  is  that  the  only  way  for  there  to  be  x  granules 
free  after  looking  at  m  jobs’  requests  is  for  there  to  have  been  x+i  granules 
free  after  looking  at  m-1  jobs  requests  and  for  the  job  to  have  i  requests 
blocked.  The  probability  that  i  requests  become  blocked  when  a  job  makes  its 
requests  (and  there  are  x+i  granules  that  are  free)  is  pB{y-i]x-¥i).  The 
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probability  that  x+i  granules  are  free  after  considering  m—1  jobs’  requests  is 
Pfix+iim  —  l)  and  the  definition  of  pf{x-,m)  follows.  is  the  probability 

that  exactly  i  requests  are  blocked  when  a  job  arrives  and  requests  7  granules 
given  that  x  are  free.  Recall  that  this  formula  was  used  earlier  when  sc(n;7n) 
was  computed.  It  is  given  b}^ 


f  ^.1  *19-^] 

"  ^ - 

Since  it  is  given  that  x  granules  are  free,  there  must  be  g  ~x  granules  that 
Eire  blocked.  The  probability  that  i  requests  are  blocked  when  a  job  arrives  and 

requests  7  granules  is  obtained  by  noticing  that  there  are  w  ways  to  choose 
the  7  -  i  granules  from  the  x  free  granules  and  ^  ways  to  choose  the  i 

granules  at  which  the  job  is  blocked  from  a  total  of  E  ways  to  make  7  requests 
from  g  granules.  The  formula  follows. 

All  the  pieces  are  available  now  so  that  uig  can  be  described. 


'w. 


-t 


i=l 


N-l 

2  ProbB{i,Tn)  * p{m\N-l) 
m  =  1 


7n.u>{i) 


(6) 


This  is  explained  by  noting  that; 


w, 


9 


Probability  that  a  job  gets  '  ^  f  Mean  wait  given  that  a  job  is 
blocked  on  i  requests  I  blocked  on  i  of  its  7  requests  , 


That  completes  the  derivation  of  the  equations.  In  the  next  section  several 
properties  of  these  equations  will  be  revealed  by  first  showing  that  they  cein  be 
combined  to  produce  one  function  in  one  variable  instead  of  three  functions  in 
three  variables.  Then,  using  the  new  function,  several  results  concerning  the 
uniqueness  of  fixed  points  and  convergence  of  the  iteration  are  investigated. 
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2.4.  Properties  of  the  Iteratloii  EquaUoiis* 

The  results  of  this  section  depend  on  the  reduction  of  the  system  of  three 
equations  in  three  variables  defining  the  model  (1)  to  a  single  equation  in  one 
variable:  /i(t)  =  t.  Having  done  so,  this  equation  is  analyzed  to  show  that  a 
fixed  point  or  solution  always  exists  and  that  under  the  condition  that 
(n  +  l)  *  ^[n  +  l]  increases  in  n,  {satisfied  under  realistic  circumstances)**,  the 
simple  Iteration  t^-n  -  converges.  It  will  be  shown  that  If  this  iteration 

converges  to  the  same  fixed  point  for  the  starting  values  of  Iq  =  0  and  fo  =  1. 
then  the  solution  of  the  model  is  unique.  If  it  does  not  converge  to  the  same 
fixed  point  for  both  starting  values,  this  would  imply  the  existence  of  more  than 
one  fixed  point  for  the  system  of  equations.  It  will  be  shown  that  there  eire  at 
most  3N-2  fixed  points  for  these  equations.  Empirically,  in  several  hundred 
experiments,  more  than  one  fixed  point  has  never  been  observed.  F\irther,  the 
simple  form  of  this  function.  h(f),  allows  some  qualitative  properties  of  the 
model  to  be  recognized. 

The  crucial  observation  is  to  notice  that  each  of  the  three  functions  can 
be  written  in  the  form 


fi  {‘^g.^UEH'^Cs) 


_ '^rKRU _ 

"^TSRU  +  +  ^UFU  +  VJQS 


IVom  (1). 


Wg  =  Fl 


“^MEM  -  P'z 
Wes  =  ^3 


y^JERM  +  'MJy  '^MEM  ^  "^CS  , 
"^TERU 


'^TERU  Wg  +  WfjEjg  +  Wes , 

“^TERM 


y^TERU  +  +  Wes 


The  reduction  followrs  from  the  lemma  below. 

Lemma  1;  Suppose  there  are  m  continuous  functions  Fj  ;  [0,l]-»[0,«>) 
Define  associated  functions :  [0.“)"' -*[0,“)  by 


fjixi.xz.  ■  ■  ■  ,x^)  =  Fj 


Xq 


■o'*" 


i  =  l 


for  some  fixed  Xo>0.  Then 

■  ■  ■  .^*m)  •  iff 

where  f  *  is  such  that  if  h{t)  is  set  equal  to 


•The  work  descrihed  in  section  2.4  was  done  jointly  with  L.  Bos. 

••This  follows  from  the  fact  that  Tims  in  Systsm  as  a  function  of  the  population  is  increase 
ing  for  real  systems. 
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- — - ,  then  h{t*)  =  t*. 

Xo+  S^(‘) 

/=1 

lYoot  (Sufficiency)  Suppose  lhat/i(f*)  =  f  *  andx\  =  flit*) ,  Then 

/,(x'i,x%,  ■  •  •  ,x'„)  =  fj{F,(n.F2if).  ■  ■  ■  .F„(f)) 

=  /;( _ fi _ 

Xo+ 

i=l 

I.  4 

=  Fj  (h  (f  •))  =  ^5  (f ')  =  X 


(Necessity):  Suppose  that  x'4  =/i(x',,x'2,  •  •  •  ,x*„)  .  lii<7n. 

t  *  =  - ^ - .  Then 


Set 


Xn  + 


1=1 


Fiin  =  Fj 


*0 


a:o  + 

1=1 


—  fj{^  1.3J  2»  ■  ■  ■  m)  ~  ^  3 


and 


A 


Theorem  2:  There  exists  at  least  one  and  at  most  3N—2  fixed  points  of  the  equa¬ 
tions  defining  our  model. 

fttxrf:  The  original  system  has  a  fixed  point  iff  the  function 


h(0  = 


'^TERM 

1=1 


has  one.  Here  t  is  set  equal  to 

_ '^TBRM _ 

^TERfl  +  ‘^MEM  "^CS 


Hovrever,  each  of  the  Fi{t)  is  positive  valued  and  hence  h  ;  [0, l]-*[0,l].  Since  h 
is  easily  seen  to  be  continuous  in  f,  it  must  have  a  fixed  point,  i.e.,  its  graph 
must  somewhere  cross  the  diagonal  of  the  square  [O.l]  x  [0,1].  By  inspection  F^ 

and  Fs  are  polynomials  of  degree  N—l  and  Fy  is  of  the  form  a  polyno- 

-^2(^-1) 

mial  of  degree  3(N-1)  divided  by  one  of  degree  2(A^— 1).  Thus,  solving  for  t  in 
h(t)  =  t  is  equivalent  to  solving  a  pol)momial  of  degree  3(yV-l)+l  and  there  are 
at  most  3N—2  fixed  points.  A 


The  fixed  points  of  /i(f)  can  easily  be  found  by  any  of  the  many  zero  finding 
algorithms.  The  mean  theoretical  Interest  would  lie  in  its  uniqueness,  but  this  is 
equivalent  to  the  generally  difficult  problem  of  showing  that  a  certain 
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pol3momial  has  a  single  root  in  the  interval  [0,1].  However,  it  will  be  shown  that 
under  a  non-restrictive  condition  the  iteration  ^*4.1  = /i(4).  ^0  €  [O.l]  con¬ 
verges,  As  previously  mentioned  this  yields  a  simple  test  of  uniqueness.  For 
logical  completeness  take  f  the  smaUest  fixed  point  of  h,  to  be  the  solution  of 
the  model.  It  will  be  shown  that  if  (n  +  l)  ♦S’[n  +  l]  is  increasing,  then 
Fi{t)  +  ^2(0  +  ^3(0  is  a  decreasing  function  and  consequently  h{t)  is  increas- 
ing. 

LeniEDa  3:  Suppose  that  h  ;  [0,1] -♦[0,1]  is  continuous  and  non-decreasing.  Let 
the  sequence  ^4]  be  defined  by  4+1  =  ^(4)  .  ^  [O.l]-  Then,  if  h(fo)>^o  .  \hl 

increases  to  the  first  fixed  point  of  h  to  the  right  of  Iq  and  if  /i(to)<fo.  Hk] 
decreases  to  the  first  fixed  point  to  the  left  of  f  o- 

Proof:  Assume  that  h{t^>tQ,  the  other  case  being  similar.  Let  f*  be  the  first 
fixed  point  of  h  to  the  right  of  4-  Then  by  induction  it  cem  be  shown  that 
^  t  ,  0^  <00,  Assuming  that  4^^*<  ^4-1  =  ^  *  since  h  is 

non- decreasing. 

Thus,  4  +  1  -  ^i^k)  ^  for  otherwase  A(fo)~^o  >  0  and  A (4)— 4  <  0  and  so 
somewhere  between  4  4.  A(f  )-f  =  0,  contradicting  the  definition  of  t*.  So, 

\t)tl  is  a  bounded  increasing  sequence  and  therefore  has  a  limit.  Since  this 
limit  must  clearly  be  a  fixed  point  of  A ,  4  increases  to  f  *.  A 

For  the  specied  case  of  the  locking  problem  being  modelled,  casual  observa¬ 
tion  reveals  that  A(0)>0  and  A(l)<l.  Thus,  the  iteration  4  +  1  =  4  =  0 

converges  to  the  smallest  fixed  point  of  A  and  that  with  4  =  1  converges  to  the 
Largest .  This.  then,  is  the  test  for  uniqueness:  if  these  two  limits  are  equal,  the 
smallest  and  largest  fixed  points  of  A  coincide  and  hence  there  is  exactly  one 
fixed  point. 

The  monotonicity  of  ^1(0  +  ^’2(0  +  F^it)  follows  from  the  following  lem¬ 
mata. 

LenKDa4:  F2{t)  +  F^it)  =  ^  p{m-,N-l)  *  bm 

m=0 


m 


where  ^  sc  (n;m)  ♦  and  a^  =  (^i  +  l)  *  5[n-l-l]. 

n=0 


Proof:  Consider  the  case  A^-1  ^ 

F3(o  =  '^cs(n 


S-Z2. 


:  N-i  < 


S.r.Z 


being  simpler.  Now 


n  =0 


S_jzJL 

=  '^'occ(n)  '(n  +  l)  *.S’(n  +  l]+  J  Occ{n)  'ML  *5tAfZ,] 

n=NL 


and 


rrj 

F2{t )  =  '^uFn/jt )  —  ^  (71  ~  ML  1}  ^  Occ  (n)  *  S[ML^. 


m=IdL 


Hence,  noting  that  5[n,]  for  n  >  ML  is  equal  to  S[ML]: 

liLJlil 

Fzit)  +  F,{t)  =  ‘  f,  Occ{n)  *(71  +  1)  •5[n  +  l] 


n=0 


=  J  Occ{n)  ^  On 


n=0 
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g--2 


L  ^  j 

=  £  On  I,  pim-.N-l)  ♦sc(n;m) 


n=0  (p=n 

Switching  the  order  of  summation  (see  Figure  2.3) 


=  i  *50(71:771.)  *  On 


m=0  n=0 


g  -  7 


^  £  p(m:iV-l)  ♦sc(7i;m)  ♦ 


m=l^hl 


71=0 


I  £-7-2 


=  £  p(m:N-l)  *  2sc(n;77L)  * 


m=0 


n=0 


But 


n  > 


+  ^  p{Tn,N-l)  J  ^sc(7i:m)  * 


n=0 


g  -2 


£-7.7. 


=>  sc(n;m)=0  =>  J  sc(7i;m)  *  On  =  *  On  if 


n=o 


n=0 


SUUL 


Thus, 


^2(^)  +  ^3(^)  =  ^^pim-.N-l)  *■  2sc(n:m)  * 
m=0 


r»-o 


=  Ep(m;W-l)  »6 


m 


m=0 


lisinnia  5: 

If  a„  ^  0  and  b,„  =  ^sc(n  ',m)  *  then  increasing  =>  increasing  and 


n=0 


avj  decreasing  =>  b^  decreasing. 

Proof:  Consider  the  case  of  a„  increasing:  On  decreasing  being  similar. 

Let  Pfn  be  the  probability  that  a  job  gets  blocked  on  at  least  one  granule 
request,  given  that  m  jobs  have  already  made  their  requests.  Then 


sc  (n.fn-hl)  = 


0  if  n  >  771 4- 1 

sc  (O.Tn)  *pm  if  n  =  0 

sc(n:77i,)  *  Pm  +sc(n  — l:m)  *  (1— Pm)  otherwise 


Hence 


rn  +  l 

+  l  =  £  sc(n:m  +  l)  ♦  Or, 

n=0 


/ 


m  Region  of  Summation 


Figure  2  3 


Graph  used  to  show  how  to  chemge  the  order  of  summation 
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m  +l 

=  sc(0:m  +  l)  *00+  SPw 

n  =  l 

=  sc(0;m)  *0.0  +  Pm 

-Pm  *  2sc(n:m) 
n=0 

'^Pm  *  2sc(Ti;m)  ^  + 

n=0 

=  2sc(n;m)  * 

n=0 


m+1 

♦sc(7i:m)  ♦o^  +  Yj  (l^m)  ♦sc(n-l;m)  ♦  a„ 

n  =  l 

*  2sc(n:7n)  ♦  a„  -I-  (l-Pm)  * 

n  =  l  n=0 

(1-Pm)  ^  2sc(n:m)  »  0^4.1 
n=0 

(l-Pm)  '  2  SC  {n;m)  ♦ 

n=0 


A 

Lemma  6:  Let  /(O  =  ^  p{m  ,N)  *  6^.  Then  6^  increasing  =>  /(f)  is  decreas- 

m=0 

ing  and  decreeising  =>  /  (f )  is  increasing. 

Pnxrf:  This  is  a  variation  of  a  property  of  Bernstein  polynomials;  see  for 
instance  [Dav75.  p.  114]. ♦  The  derivative  of  /(f),  /'(f).  is  calculated,  from 
which  the  result  is  immediate. 


*{N-m)  *(l-f)"» 

*bm  *m  ♦(l-f)»”-i 


=  _ m _ _ 


A  _ W 


*  m! 


'^(A^-m)  •(1-f)”' 

*bm*m  *  ♦(l-f)'"-l 


=  jv  »  V - 

^o(Af-m-l)!  »  (m-1)! 


*bm  *(l-f)"' 


—  N  *  - ^ -  *  (m  +  l)  ^5+1 


=  N  • 


t  '(l-O™  '(im  -6m4i) 


♦  f 


N-m-\ 


(1-0" 


pi=0 


•Bernstein  polynomials  are  of  the  form  \x')  —  ^  / 


jksO 


71 


X^{\  -  xY  *  and  the 


property  referred  to  is  that  if  /  (x)  is  non-decreasing  on  0  ^  X  1  then  B^{f  ;x) 


IS 
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A 

It  has  now  been  shown  (Lemmas  4,  5  and  6)  that 

Proposition  7:  If  (n  +  l)  *  S[n  +  l]  is  an  increasing  (resp.  decreasing)  sequence  , 
then  F2{t)  +  ^3(0  is  a  decreasing  (resp.  increasing)  function.  A 

Further  it  is  shown  that; 

Proposition  8:  ^1(0  =  Wg{t)  is  a  decreasing  function  if  (n  +  l)  ♦5[n  +  l]  is  an 
increasing  sequence. 

Proof  of  Proposition  8: 

Lanma  9; 


w„  = 


i’^TERM  '^g  "^MEM  '^Cs) 

N 


* 


(AT-l) 


♦(1-0  +  (r-1) 


’ 

N-ir 

1- 

1-2-  *(1-0 

19] 

i  i 

♦  (1  -  (t  +  r  ♦  (1-0)"“') 


where 


Proof: 

~  S  S  f^obB{i\in)  * p{7n]N‘-l)  *  mw{i) 

4=1  m=l 

td  m~l 
y  N-l 

“  ^  * p{7n]N—l)  since  ProbB{i\0)  =  0,  i>0. 

4=1  m=0 

y 

=  \*  £p(»^:^~l)  *  '^ProbB{i‘,Tn) 
m  =0  4  =  1 

-><  *  ^p{m,N-l)  *  {l-PrQbB{0]Tn)) 

771=0 


Now  ProbB{0,m)  is  the  probability  that  a  job  gets  blocked  on  0  requests 
given  that  m  jobs  have  already  made  their  requests.  This  is  equal  to  the  proba¬ 
bility  that  each  of  these  m  jobs  has  requested  all  its  7  granules  from  among  the 
9  ~  7  granules  not  requested  by  the  current  job.  So, 


ProbB{0:m) 


=  X 


m 


Hence, 


m=0 


non-decreaadng  there.  The  proof  of  the  variation  of  this  property  is  given  in  detail  above. 
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=  X  +a:  ♦(l-O)M 


The  equation  for  \  is  derived  similarly.  The  complete  derivation  is  given  in 
^pendix  A. 


X  =  ^  •Tcs  ' 
7 


iN-l) 


y) 


(l-f)  +  (r-l)  ♦ 


1- 


1_2.  ♦  (i_t) 

9 


and  the  result  follows  since  Tcs  =  -  - • 

By  Lemma  9 


Wg  = 


where 


Bit)  = 


SL 

[71 


*  < 


{N-l) 


N 


/  \ 
7. 
9 


*  (1-0  4-  (r-l)  ♦ 


1-2.  #  (1-.^) 
9 


1- 

V  V 


/J 


(1  -(f  +i  '(1-0)"-') 


and 


-ki 

I?) 


Solving  for  tUg  it  is  seen  that 

^  _  {“^TERU  +  *  ^{t) 

■  N-Bit) 

+  -^2(0  +  ^3(^))  *  ^jt) 

N-Bit) 

It  is  easy  to  see  that  B{t)  <  A^-l*  so  Fiit)  is  well-defined.  Also,  by  examin¬ 
ing  the  derivative  of  Bit),  it  can  be  seen  that  Bit)  is  decreasing.  Hence, 


l^-Bit)  decreasing.  By  Proposition  7,  +  lUj^it) wcsit)  is 

decreasing.  Thus  Fiit)  is  the  product  of  three  positive  decreasing  functions 
and  must  itself  be  decreasing.  A 

It  is  now  possible  to  recover  a  qualitative  property  of  the  model.  Proposi¬ 
tion  10  shows  that  by  increasing  the  think  time,  the  model  predicts  that  the 
total  waiting  time  for  memory  and  central  subsystem  resources  will  decrease. 
This  is  intuitive  since  increasing  the  think  time  decreases  the  fraction  of  time 
jobs  spend  contending  for  resources  in  the  central  subsystem.  Thus  the  average 
number  of  jobs  in  the  central  subsystem  is  lower  and  the  waiting  time  is 


♦Derivative  is  <  0  and  5(0)  ^  A^— 1. 
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decreased. 

Propositioii  10:  If  (n-i-1)  ♦  5[n4-l]  is  increasing,  then  increasing  the  think  time, 
‘^TERfi-  decreases  u)u£t/  +  wcs- 

Proof:  Notice,  from  (7),  that  h{t)  is  of  the  form 


O'  *  "^TERM  ^ 


a,b  >  0 


_  1 

■ 

'0'7E?W 

Hence,  “^terUo  ^  '^term,  iniplies  that 

c  1 


h2{t )  - 


'^TESU^  +  FS)  Fi[t)  +  F3(t)  ^ 


'^TERUy 

"^TERMi  +  -^1(0  +  ^2(0  +  -^3(0 


/i,(0 


Since  ^i(0),h2(0)  >  0,  the  smallest  fixed  point  of  /12.  ^*2-  is  greater  than  or 
equal  to  that  of /ij,  f  *j.  (See  Figure  2.4) 

But,  for  the  think  time, 

B 

“^MEHz  “^CS^  “  2)  2)  —  *1)  +  ■F3(f  *1)  =  'UJjSfOTi 

since  ^2(0  +  ^’3(0  is  decreasing.  A 

Note  that  exactly  the  same  argument  shows  that  if  (n  +  l)  ♦5[7i  +  l]  is 
djscreasiTig ,  then  increasing  the  think  time  vnjcreases  w^isd  '^cs-  (F’or  real  sys¬ 
tems,  however,  (n  +  l)  ♦5[n  +  l]  typically  increases.)  The  corresponding  state¬ 
ment  for  Wg  would  be  nice,  but  when  Wg  is  vrritten  in  terms  of  f ,  a  factor 

remains  in  the  equation  of  Fi,  complicating  the  matter,  and  precluding  similar 
results. 


^\(t)  and  h2(t')  versus  t 


Figure  2.  A 


Graph  used  to  show  that  increasing  the  'Wteru  decreases  +  -wcs 


-89- 


/ 


Chapter  Three 

Centralized  Database  Model  Results  aind  Extensions 


3.1.  Solution  of  the  Model 

To  assess  the  validity  of  some  of  the  assumptions  made  in  the  heuristic 
etnalytic  model  presented  in  Chapter  2,  a  simulation  program  was  written.  The 
simulation  program  uses  the  same  model  input  parameters  as  the  analytic 
model.  The  objective  is  to  investigate  whether  there  are  identical  trends  in  the 
performance  measures  produced  by  each  model.  If  the  difference  between  the 
answers  produced  by  the  two  models  is  consistently  only  a  few  percent,  then 
this  would  suggest  that  performance  Is  relatively  insensitive  to  the  additional 
assumptions  made  in  constructing  the  analytic  model.  Of  course,  since  some 
assumptions  are  made  in  both  models,  a  simple  comparison  of  the  results  can¬ 
not  judge  their  validity  with  respect  to  actuad  systems.  The  supposition  that 
the  choice  of  any  particular  granule  is  equiprobable  is  an  example  of  the  latter 
type  of  assumption.  An  area  of  further  research  might  be  to  show  that  these 
Idnds  of  assumptions  can  be  eliminated  or  that  they  are  reasonable.  The  par¬ 
ticular  assumption  mentioned  above  can  be  eliminated  if  information  about 
how  often  each  granule  is  accessed  is  available.  This  modification  is  outlined  in 
section  3.4. 

For  this  experiment,  three  sets  of  data  were  used.  For  each  set  of  data,  five 
parameters  were  modified  to  determine  their  effect  on  performance.  The  five 
parameters  were  g,  the  number  of  granules,  N,  the  number  of  customers,  Z,  the 
think  time,  y,  the  number  of  granules  requested  per  customer,  and  ML,  the 
maximum  allowable  multiprogramming  level,  g  was  vciried  from  4  to  10  by  2,  N 
from  4  to  20  by  8,  Z  from  2  to  22  by  4,  y  from  1  to  3  by  1,  and  ML  from  1  to  3  by 
1. 


The  procedure  used  in  this  experiment  was  to  decide  first  which  value  of  r 
to  use.  Recall  that  r  is  the  residual  life  divided  by  the  expected  total  life  of  the 
job  at  the  head  of  the  queue  at  the  granule  when  a  job  arrives  and  requests  its 
granules.  This  determination  is  based  on  the  input  parameters  and  is  a  func¬ 
tion  of  system  load.  System  load  in  this  context  is  a  measure  of  the  amount  of 
interaction  in  the  CS  so  a  large  value  of  or  7  or  a  smcdl  value  of  g  or  Z  contri¬ 
bute  to  a  higher  system  load,  r  decreases  with  system  load  and,  based  on  simu¬ 
lation  results,  usually  lies  in  the  range  (0.5, 1.0).  For  a  system  under  moderate 
loads,  an  r  of  about  0.8  seems  to  work  well.  As  N  increases  or  Z,  ^ ,  or  ML 
decreases,  the  system  load  increases  and  a  slightly  lower  value  of  r  is  appropri¬ 
ate.  Also,  as  7  rises,  the  number  of  granules  {g)  affects  the  choice  of  r  and  simi¬ 
larly  when  the  service  rate  in  the  CS  is  slow  (e.g.,  in  dataset  III),  more  jobs  are 
kept  waiting  in  queue  at  the  granules  and  the  value  chosen  for  r  should  be 
diminished  accordingly. 


A  simple  equation  that  satisfies  these  properties  was  used  to  generate  a 
value  for  r.  The  equation  became  a  part  of  the  iteration.  The  formula  is  based 
on  the  following  characterization  of  the  system  load.  If  the  estimate  of  the 
average  number  oT  jobs  in  The  CS  is  much  less  than  the  number  of  jobs  eillowed 


there,  namely  the  minimum  of  ML  and 


2. 

7 


then  the  system  is  considered  to  be 
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in  a  light  load  situation.  If  the  estimate  of  the  average  number  of  jobs  is  close 
to  the  number  of  jobs  allowed,  then  the  system  will  be  considered  to  be  under 
heavy  load.  This  leads  to  the  following  estimate  for  r ; 


N 


Wes 


+  ‘^MEU  + 


min 


fi- 

7 


- - 


ML 


In  some  heavy  load  cases,  the  iteration  technique  described  in  equation  (1) 
of  Chapter  Z  yielded  a  value  of  r  that  was  less  than  0.5.  In  this  case,  the  value 
of  r  was  set  equal  to  0.5.  The  simulation  and  analytic  models  were  run  using 
this  technique  for  generating  r  to  obtain  TirriE  m  ^stem  emd  Throughput  per¬ 
formance  measures.  Recall  that  these  performance  measures  are  being 
derived  for  a  steady  state  situation,  and  that  these  measures  represent  means 
or  expected  values  only.  Distributions  for  these  performance  measures  are  not 
available  using  this  solution  technique. 

The  first  set  of  data  used  the  following  intercompletion  times  for  the  Cen¬ 
tral  Subsystem; 


5[i]  =  f.f7  S[-5]=.5f7  5[3]  =  .35 


Since  S[i]  is  assumed  to  be  exponentially  distributed  and  is  the  mean  time 
between  departures  from  the  CS,  when  there  are  i  jobs  in  the  CS,  this  first  set  of 
data  modelled  a  system  that  handled  concurrency  extremely  well.  Each  job  in 
the  multiprog  rammed  set  progressed  at  a  rate  independent  of  the  presence  of 
concurrent  jobs. 

The  second  set  of  data  models  a  system  that  has  an  intermediate  amount 
of  congestion.  The  intercompletion  times  are: 


The  third  set  of  data  models  a  system  where  concurrent  jobs  encounter  a 
great  deal  of  congestion.  The  intercompletion  times  are: 

5[J]=i.f7  5[^]=i.f?  S[3]  =  1.0 


3.2.  IXscuasion  of  the  Results 

The  objective  of  this  study  was  to  produce  an  analytic  model  that  can  be 
used  to  predict  performance  for  a  DBMS  that  exhibits  blocking  for  resources 
and  uses  a  particular  locking  algorithm  to  maintain  consistency.  Nine  tables 
are  included  in  Appendix  D.  Table  I-l  includes  results  for  dataset  1  with  7=1, 
Table  1-2  shows  results  for  dataset  1  with  7  =  2,  etc.  The  Throughput  and 
Time  in  System  results  are  listed  along  with  the  percent  difference  between 
the  simulation  and  analytic  models.  The  simulation  was  run  until  the  95% 
confidence  interval  lay  entirely  within  ±5%  of  the  answer.  The  confidence  inter¬ 
val  was  generated  using  either  the  Student  T  distribution  or  the  Normal 


-31- 


distribution  depending  on  the  number  of  data  points  that  were  available.  If  the 
number  of  runs  (or  data  points)  was  less  than  thirty,  then  the  Student  T  distri¬ 
bution  was  used.  Otherwise,  the  Normal  distribution  was  used.  Some  of  the 
information  contained  in  the  tables  is  condensed  and  summarized  in  Figures 
3.2  through  3.4.  In  the  cases  studied,  the  Throughput  predicted  by  the  analytic 
model  deviated  an  average  of  7.8%  from  the  simulation  model  and  the  Time  in 
System  performance  measure  differed  an  average  of  19.8%  from  the  simulation 
model. 
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Figure  3.2 


Summeu^y  of  %  Difference  Results  for  Model. 


Figure  3.2  Is  a  summary  of  the  percent  difference  between  the  simulation 
euid  analytic  model  results  for  each  value  of  7  and  overall.  Clearly,  the  model 
works  best  for  7=1. 

The  assumption  that  the  wait  in  queue  is  exponentially  distributed  with 
meem  \  turned  out  to  be  too  pessimistic,  so  in  all  the  datasets  the  assumption 
that  the  wait  was  deterministic  was  made,  i.e.,  that  mw{i)  =  \  for  all  i.  As 
would  be  expected,  the  exponential  assumption  worked  slightly  better  when  the 
load  was  light,  but  the  deterministic  assumption  was  acceptable  in  all  cases. 

The  graph  in  Figure  3.3  shows  the  trends  that  are  expected  e.g..  Figure  3.3 
shows  that  Throughpui  rises  with  increased  g  and  then  levels  off  as  there  is  less 
congestion.  Figure  3.4  shows  that  Throughput  decreases  with  increasing  7. 
That  is,  as  customers  request  more  granules,  service  deteriorates.  In  fact, 
when  the  number  of  customers  is  very  high,  the  throughput  for  y  -  Z  and  7  =  3 
is  almost  the  same.  This  is  because  at  high  loads  there  are  so  many  conflicts 
that  the  probability  of  having  more  than  two  jobs  in  the  system  is  very  low. 

Figure  3.5  shows  how  Throughpui  varies  with  g  for  larger  numbers  of 
granules  and  for  differing  values  of  7.  For  7  =  3  there  is  an  anomaly  when 
10  <  p  <24  since  the  multiprogramming  level  does  not  have  any  effect  in  these 
areas.  For  7  =  4,  the  anomaly  occurs  for  g  <  32.  The  reason  for  this  unusual 
behavior  is  that  there  are  not  enough  granules  so  that  each  of  ML  jobs  can  own 
7  granules  and  the  actual  maximum  multiprogramming  level  is  constrained  to 

be  \g/y\. 

Figure  3.6  takes  into  account  the  additional  overhead  that  is  expected 
when  there  are  many  granules  to  keep  track  of.  The  tests  that  were  run 
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Graph  of  Throughput  versus  granularity  for  various  transaction  sizes. 
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Lnvolved  inflating  the  intercompletion  times  at  the  CS  to  reflect  the  locking 
overhead.  The  method  used  to  compute  Soy^rtwad  was; 

0  for  S'  ^  10 

*5tw*T*eod  “  overhead  *  {g  —  10)  Tors'  ^ 

where  overhead  tn  the  example  was  equal  to  ,01. 

The  input  parameters  were  then  calculated  from  the  measured 

intercompletion  times  5'[i],  as: 

The  eissumption  of  a  linear  locking  overhead  was  made  for  simplicity.  The 
model  C2in  accommodate  a  veiriety  of  functions  for  the  overhead.  Probably  the 
best  choice  would  be  a  step  function  to  reflect  the  situation  that  the  cost  to 
keep  track  of  the  first  x  granules  is  a  constant,  since  they  can  be  kept  in  main 
memory.  Then,  to  keep  track  of  the  next  y  granules  is  another  fixed  overhead, 
namely  reading  in  an  additional  block  of  information  off  of  a  secondary  storage 
device, etc.  Thus,  a  step  function  would  probably  be  the  best  way  to  estimate 
the  locking  overhead. 

The  number  of  granules  required  by  the  transaction  should  also  be 
changed  because  as  the  granularity  of  the  database  is  made  finer,  the  transac¬ 
tion  may  be  forced  to  request  more  granules  to  do  its  processing.  Figure  3.6 
shows  both  bounds  that  can  be  expected.  The  lower  bound  on  Throughput 
occurs  when  the  ratio  olg  to  7  remains  fixed.  For  this  bound  the  assumption  is 
made  that  if,  when  the  database  has  ten  granules  the  trcinsaction  requests  two 
granules,  then  if  the  granularity  is  changed  so  that  there  are  twenty  granules, 
the  transaction  will  require  four  granules.  This  is  a  pessimistic  assumption. 
The  optimistic  counterpart  would  be  that  the  transaction  need  not  request 
more  granules  even  though  the  granularity  becomes  finer.  These  bounding 
cases  are  both  shown  in  Figure  3.6  and  are  labelled  PESS  and  OPT,  respectively. 

The  actual  situation  would  have  to  be  determined  by  observing  the  transac¬ 
tions  and  investigating  their  locality  properties.  If  the  transactions  tended  to 
access  granules  in  the  same  locality,  then  the  performance  would  be  close  to 
that  of  the  optimistic  case.  Since  the  objective  of  having  finer  granularity  is  to 
allow  more  parallelism,  it  would  be  the  Job  of  the  system  analyst  to  adjust  the 
granularity  so  that  the  performance  is  as  close  to  the  optimistic  case  as  possi¬ 
ble,  i.e.,  so  that  the  gremules  are  set  up  so  that  each  transaction  requires  as  few 
granules  as  possible. 

The  optimistic  case  confirms  results  reported  by  Ries  that  the  system  per¬ 
formance  is  improved  for  coarse  granularity  because  the  overhead  dominates 
when  there  is  fine  granularity  [Rie79].  Ries  found  that  the  optimal  greinularity 
was  around  40  granules,  within  the  assumptions  that  were  made.  The  algorithm 
investigated  by  Ries  allowed  restarting  of  transactions  and  this  is  what 
accounted  for  the  low  throughput  levels  when  the  granularity  was  too  coarse 
[Rie79].  In  the  algorithm  investigated  in  this  thesis,  the  transactions  are  never 
restarted.  This  could  account  for  why  the  optimal  granularity  was  found  to  be 
less  than  40,  e.g,,  in  Figure  3.6  the  optimal  value  is  about  25  granules.  Of 
course,  this  figure  depends  on  the  amount  of  locking  overhead  that  the  system 
generates  as  a  function  of  the  granularity.  Since  Ries  does  not  account  for  the 
overhead  required  to  scan  the  blocked  queue,  the  locking  overhead  calculated 
by  the  model  may  not  be  as  dependent  on  the  granularity  as  it  should  be. 

The  locking  overhead  dominates  at  finer  levels  of  granularity  if  the 
granules  are  not  well-chosen  as  the  performance  then  tends  toward  the 
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pessimistic  case  in  Figure  3.6.  If  the  granules  are  well-chosen,  then  the  possi¬ 
bility  of  improved  performance  is  clear  from  Figure  3.6. 

3.3.  Conclusions 

The  main  advantage  of  this  analytic  technique  over  the  simulation  is  the 
computer  time  saved.  The  simulation  runs  required  minutes  of  computer  time, 
more  than  ten  times  more  computer  time  than  the  analytic  model  to  produce 
confidence  intervals  so  that  the  results  were  within  5%  of  the  mean,  95%  of  the 
time.  For  a  particular  session  where  six  sets  of  data  were  run  the  simulation 
required  13  minutes.  11.4  seconds  to  complete.  For  the  same  sets  of  data,  the 
an^ytic  model  required  35.2  seconds  to  complete.  Both  tests  were  run  on  a 
PDF  11/45  running  the  UNIX^  operating  system.  It  should  also  be  pointed  out 
that  the  simulation  was  programmed  in  such  a  way  as  to  make  it  reasonably 
efficient  whereas  the  analytic  solution  package  was  programmed  in  such  a  way 
that  changes  were  easy  to  carry  out.  and  so  that  the  structure  of  the  equations 
remained  intact  and  were  cleeirly  visible.  Since  the  answers  produced  by  the 
two  models  are  close,  this  saving  in  computation  time  is  significant.  It  was 
shown  that  this  technique  will  produce  a  fixed  point;  however  nothing  was  said 
about  the  speed  of  convergence.  In  almost  all  cases  studied,  the  analytic  tech¬ 
nique  converged  in  less  than  15  iterations  with  an  absolute  tolerance  of  .001. 
consuming  a  small  amount  of  CPU  processing  time.  A  tolerance  of  .001  means 
that  the  iteration  terminated  when  'w'g,  vD'ffgfj,  anduj'cs-  each  differed  by  less 
than  .001  from  -Wg,  uijisfi,  and  rucs-,  respectively  in  equation  (l)  of  section  2.3. 
Also,  no  cases  with  more  than  one  fixed  point  were  observed.  Other  advantages 
of  analytic  models  over  simulation  models  are  described  in  [Gra78]. 


3.4.  Bounds  on  the  Performance  of  the  Nonuniform  Model 


The  original  model  that  incorporates  the  assumption  that  the  choice  of  any 
particular  granule  is  equiprobable  can  be  used  to  find  bounds  on  the  perfor¬ 
mance  of  a  system  where  the  uniformity  assumption  does  not  hold.  This  claim 
is  based  on  a  technique  for  bounding  the  performance  of  separable  queueing 
networks  using  balanced  job  bounds  (BJBs)  [ZSEG62]. 


Consider  a  system  in  which  N  statistically  identical  jobs  circulate  among  K 
devices  such  that  the  total  service  required  per  job  completion  at  each  device 
is  given  by  the  loadings  Li,  Lg,  Lx-  Let  Rq  be  the  sum  of  the  K  loadings  and 
denote  the  maximum  (or  bottleneck)  of  the  K  loadings.  Let  La  be  the  aver¬ 
age  loading,  i.e..  Rq  /  K.  BJBs  provide  upper  and  lower  bounds  on  throughput 
and  response  times.  Throughput,  for  example,  is  bounded  in  the  following  way 
for  a  job  population  N\ 

- - — r— —  thro'iighjmt  with  N  customers  <  — - r— — — 

Rq  +  (N-l)  Lt  ^  ^  +  {N-\)  La 


The  upper  bound  represents  the  solution  of  a  balanced  network  consisting 
of  N  jobs  and  K  devices,  each  with  loading  La  representing  a  total  loading  of  Rq. 
The  lower  bound  represents  the  solution  of  a  balanced  network  consisting  of  N 
jobs  and  Rq  /  devices,  each  with  loading  .  Since,  in  genereJ  Rq  /  is 
non-integral,  the  corresponding  system  is  hard  to  interpret  physically.  In  the 
uniform  choice  model,  the  number  of  granules  must  be  integral,  so  it  is  neces¬ 
sary  to  find  a  different  lower  bound  (perhaps  not  so  tight)  that  can  be  mapped 

as  good 
^0 


into  a  physically  realizable  system.  The  new  lower  bound  which  is  no 

as  the  one  mentioned  in  [ZSEG82]  will  correspond  to  a  system  with 

ices,  each  with  loading  Z^.  More  detail  about  balanced  job  bounds  can  be 
in  [ZSEG82]. 
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The  same  technique  can  be  used  to  bound  the  performance  of  a  computer 
system  where  the  choice  of  granule  is  nonuniform.  To  do  this,  two  models  that 
bound  the  performance  of  the  nonuniform  system  under  consideration  are 
solved  under  the  uniform  access  assumption.  The  loadings  correspond  to  the 
access  probabilities  for  the  granules. 

For  example,  suppose  that  there  are  ten  granules  with  the  following  access 
probabilities; 

p.(l)  =  .2  pc(2)  =  .15  Po(3)  =  .15  p»(4)  =  .l  Pc(5)  =  .l 

p,  (6)  =  .1  Pc  (7)  =  ,05  Pc  (8)  =  .05  Pc  (9)  =  .05  pc(10)  =  ,05 

Then,  using  the  same  reasoning  as  for  balanced  job  bounds,  an  optimistic 
bound  on  the  performance  of  the  system  can  be  obtained  by  solving  the  uni¬ 
form  model  with  ten  granules.  The  pessimistic  bound  is  derived  by  solving  the 
uniform  model  with  five  granules.  The  number  five  was  obtained  using: 

5>c  (i) 

_ t _ 

m^  (Pc(i)) 
i 

Figure  3.7  shows  the  results  of  an  experiment  where  the  access  probabili¬ 
ties  were  assumed  to  be  those  given  in  the  above  example.  The  bounds  on 
IhroughpiLt  are  shown  as  well  as  the  Throughput  predicted  by  a  simulation  of 
the  nonuniform  system. 

As  another  example,  consider  a  system  that  has  a  very  non  uniform  pattern 
of  reference  for  its  granules.  Suppose,  for  example,  that  there  are  ten  granules 
with  the  following  unbalanced  access  probabilities; 

7>c{l)=  8  Pc(2)  =  .06  jr)c(3)  =  .025  p^{Ar)  =  .025  p^{5)  =  .025 

jDc(6)  =  .025  Pc{7)  =  .01  p,{Q)  =  .01  p^{9)  =  .01  p^lO)  =  .01 

Figure  3.8  shows  the  two  uniform  systems  that  bound  the  throughput  of 
the  unbalanced  system.  Also,  simulation  results  of  the  nonuniform  access 
situation  are  shown.  The  two  bounding  cases  are  with  1  and  10  granules.  Since 
granule  number  one  provides  such  a  distinct  bottleneck,  the  performance  of 
the  system  can  be  expected  to  be  approximated  best  by  the  pessimistic  uniform 
system.  As  seen  in  Rgure  3.6,  the  optimistic  bound  is  not  as  tight  as  one  might 
like. 

When  there  is  a  fixed  number  of  devices  in  a  computer  system  and  a  fixed 
total  load,  the  optimal  distribution  of  the  load  would  be  to  place  it  as  evenly  as 
possible  over  ^l  the  devices,  in  other  words  to  eliminate  any  severe 
bottlenecks.  Unfortunately,  it  is  difficult  to  transfer  the  load  sometimes,  e.g., 
from  a  disk  drive  to  a  printer.  Usually  all  that  can  be  done  is  to  transfer  some 
processing  from  one  disk  to  another.  When  a  system  analyst  is  deciding  how  to 
allocate  the  granules,  there  may  be  more  flexibility  so  that  the  access  can  be 
made  to  be  more  uniform. 

3.5.  A  Modification  to  the  Locking  Algorithm  Scheduling  Policy 

In  this  section  the  original  model  will  be  slightly  perturbed  to  show  how  the 
aneilytic  technique  can  be  used  to  model  a  different  locking  algorithm.  The 
model  will  be  used  to  model  the  performance  of  a  system  similar  to  that  used  to 
generate  the  results  reported  in  Figure  3.5.  Differing  levels  of  overhead  will  be 
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considered  to  see  the  effect  on  system  performance. 

The  algorithm  considered  is  similar  to  the  one  presented  in  [Rie79j  and 
different  from  that  in  the  original  model,  In  the  original  model,  a  job  made  its 
request  for  its  needed  granules  and  waited  FCFS  until  all  its  requests  could  be 
granted.  Meanwhile,  no  jobs  were  allowed  to  pass  each  other  in  line.  In  this 
way,  several  problems  common  to  many  concurrency  control  mechanisms  were 
avoided,  among  them  deadlock  and  indefinite  postponement.  The  tradeoff  is 
that  some  jobs  that  may  have  been  able  to  proceed  are  unable  to  proceed  as  a 
result  of  a  job  blocking  them. 

The  algorithm  used  in  [Rie79]  allows  jobs  to  pass  each  other  as  soon  as 
they  can  obtain  all  their  granules.  If  they  cannot  obtain  all  their  granules, 
however,  they  must  wait  until  the  granules  are  all  free.  During  this  wait  they 
do  not  hold  locks  on  any  of  the  granules.  For  this  reason  deadlock  cannot 
occur,  but  indefinite  postponement  is  still  a  possibility  in  this  algorithm.  How¬ 
ever,  as  long  as  the  arrival  rate  of  jobs  does  not  saturate  the  system,  indefinite 
postponement  will  not  occur.  The  results  reported  by  Ries  are  not  directly 
comparable  with  the  results  obtained  using  the  model  described  here  because 
in  [Rie79],  a  parameter  called  RAD  is  used  to  determine  the  distribution  of  the 
sizes  of  the  transactions.  In  this  model,  the  assumption  is  made  that  all  tran¬ 
sactions  choose  the  same  number  of  granules.  Ries  does  not  consider  any 
cases  where  RAD  is  chosen  so  that  all  transactions  are  the  same  size. 

The  necessary  change  for  the  original  model  is  in  the  equation  for  Wg.  Wg 
becomes: 

_  [probability  of  m]  [wait  at  granule  given' 

^9  Li  Ifiot  at  terminals]  I  m  not  at  terminal  . 


where  the  wait  at  the  granule  given  m  jobs  are  not  at  the  terminals  is: 


2 
n  =  l 


[probability  of  n  customers  available  ] 


wait  at  granule  given  that 


tto  run  given  m  have  been  considered]  in  customers  are  available  to  run 


If  there  are  n  jobs  available  to  run  after  m  jobs  have  been  considered  by 
the  scheduler,  then  there  are  m-n  jobs  in  the  granule  queues  that  have  not 
yet  proceeded  to  the  MEM_queue  or  the  CS.  They  will  proceed  at  a  rate  that 
depends  on  the  number  of  jobs  in  the  CS  and  the  maximum  multiprogramming 
level.  The  rate  will  be  1 /5[min(ML.n)]  where  n  is  the  number  of  jobs  available 
to  run.  The  residence  time  that  ceui  be  expected  when  there  are  m-n  jobs 
proceeding  at  this  rate  is  obtained  with  a  simple  application  of  Little's  Law.  The 
wadt  at  a  granule  given  there  are  n  customers  available  to  run  when  ttl  have 
been  considered  is: 

(m  —  n) 

S[tnin  L  ,7i)] 


Wg  can  now  be  written  as: 

Wg  ^  ^  p{m\N-i)  ^pq{n.m) 

m  =  l  n  =  l 


(m  —  n) 

S[TrLin{ML,n)] 


where  pq{7i  ‘,m)  is  the  probability  of  n  customers  being  available  to  run  given  m 
have  been  considered.  It  is  defined  recursively  as: 

0  if  m<n  or  n<0  or  m<0  or  n>g/y 

or  if  n=0  amd  m  7^0 
1  if  n=m=0 


p,(n-l:m  1) 

1 

f  .  v 

+  Pg{n,7n-1) 

Pg{n-,m)  = 
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The  motivation  for  jDg (n;m)  is  that  there  are  only  two  ways  for  there  to  be  n 
jobs  available  to  run  after  considering  m  jobs,  (l)  After  considering  m  — 1  jobs 
with  n  — 1  jobs  available  to  run,  the  next  job  becomes  available  to  run.  This  hap¬ 
pens  with  probability: 


since  there  are  n— 1  jobs  requiring  y  granules  already  holding  those  granules. 
There  are  g  —y{n  —  l)  granules  left  for  the  arriving  job  to  choose  y  granules 

from.  The  total  number  of  ways  to  choose  the  granules  is  j^j  and  the  above  pro¬ 
bability  follows.  (2)  After  considering  m-1  jobs  with  n  jobs  available  to  run,  the 
next  job  does  not  get  all  its  granule  requests  and  cannot  proceed.  This  happens 
with  probability: 


The  formula  for  Wg  follows. 

The  model  was  used  to  predict  the  performeince  of  a  system  with  the  follow¬ 
ing  parameters:  N  =  10,  Ml.  =  B,  Z  =  5,  S[l]  =  1.0,  S[2]  =  0.6,  S[3]  =  0.6,  S[4]  = 
0.5.  S[5]  =  0.4,  S[6]  =  0.3,  S[7]  =  0.2,  S[B]  =  0.2. 

The  granularity  was  varied  from  10  to  100  granules  and  the  overhead  for 
locking  was  set  at  0.  0.001,  and  0.01.  The  results  are  shown  in  Figure  3.9  and  Fig¬ 
ure  3.10.  The  optimal  choice  of  granularity  is  greater  than  100  if  there  is  no 
overhead,  about  30  if  the  locking  overhead  is  0.001  and  less  than  10  if  the  over¬ 
head  is  0.01.  Also  shown  is  the  performance  that  can  be  expected  from  the  ori¬ 
ginal  model  with  the  overhead  set  to  0.01.  When  7  =  1,  the  original  model  can 
be  expected  to  have  approximately  the  same  performance  as  the  modified 
scheduler  since  no  jobs  are  delayed  unnecessarily  in  either  model.  When  7  >  1, 
then  the  modified  scheduling  mechanism  can  be  expected  to  have  a  higher 
throughput  rate  than  the  original  model  scheduler.  This  is  because  in  the  origi¬ 
nal  model  there  are  situations  where  a  job  may  have  to  wait  even  when  its 
granules  are  not  claimed  by  a  job  in  the  CS.  They  may  be  jointly  claimed  by 
other  jobs  waiting  in  the  granide  queues.  Figure  3.10  shows  the  results  for  a 
system  simileir  to  that  used  for  Figure  3.9  where  the  jobs  request  three 
granules  each,  i.e.,  7  =  3.  The  modified  scheduler  is  seen  to  bound  the  perfor¬ 
mance  as  expected 

It  is  conjectured  that  locking  algorithms  where  the  probability  of  deadlock 
is  non-zero  may  be  hard  to  model  analytically  unless  the  probability  of  deadlock 
can  be  quantified.  Recent  results  show  that  even  when  using  these  locking 
algorithms  the  occurrence  of  a  deadlock  is  rare  [Ber82].  If  this  is  the  case  then 
analytic  models  of  these  systems  may  be  possible  as  well.  The  possibility  of 
deadlock  could  be  ignored  until  after  the  model  had  been  solved  at  which  point 
it  could  be  noted,  for  example,  that  the  presence  of  a  deadlock  could  only  hurt 
the  performance.  This  would  indicate  that  the  performance  measures  gen¬ 
erated  were  optimistic  ones,  It  should  be  pointed  out  that  some  operating  sys¬ 
tems  in  use  today  detect  deadlock  by  having  a  user  notice  that  his  process  is 
taking  an  inordinate  amount  of  time.  The  user  then  uses  his  BREAK  key  to  kill 
the  process,  thus  solving  the  deadlock  problem.  Since  these  operating  systems 
really  do  exist  and  do  function,  this  lends  credibility  to  the  claim  that  deadlock 
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is  indeed  a  rare  occurrence.  Whether  this  phenomenon  occurs  in  the  database 
environment  remains  an  open  question. 
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Chapter  Four 

Concurrency  Control  Mechanisms  in  Distributed 
Database  Management  Systems 


4.1.  Introduction 

Modelling  distributed  database  management  systems  (DDBMS)  involves  an 
added  level  of  complexity  not  present  in  centralized  database  systems.  Garcia- 
Molina  did  a  study  of  distributed  databases  with  replicated  data  using  both 
simulation  and  analytic  models  [GaMo?9].  He  found  that  his  iterative  analytic 
model  did  not  produce  very  good  results  at  high  load;  however,  at  low  load  it  did 
agree  quite  well  with  simulation  results  at  a  much  lower  cost.  For  his  model, 
the  communication  delay  was  assumed  to  be  a  fixed  constant.  He  considered 
several  different  concurrency  control  mechanisms  in  his  study,  among  them 
several  variations  of  Centralized  Two-phase  Locking  (described  in  section  4,6.2) 
and  a  distributed  voting  algorithm.  He  concluded  that  he  was  unable  to  deter¬ 
mine  an  algorithm  that  was  superior  due  to  the  simple  nature  of  the  model,  but 
that  the  centralized  control  algorithms  should  bear  consideration. 

In  a  more  recent  analytical  study  by  Shum  and  Spirakis,  several  forms  of 
Basic  2PL  (described  in  section  4,6.1)  were  modelled  [ShSpBl],  Their  approach 
was  to  bound  the  performance  of  the  algorithm  by  using  algorithms  that  were 
guaranteed  to  have  more  restarts,  thus  worse  performance.  They  assume  that 
there  is  no  redundancy  in  the  database  and  in  their  distributed  version  they 
assume  there  is  one  granule  per  database  site.  These  assumptions  are  all 
necessary  so  that  the  einalytic  model  is  tractable.  When  the  results  are  com¬ 
pared  to  global  badance  solutions  of  the  same  systems,  the  results  eire  indeed 
bounds  but  they  are  not  very  tight.  Analytic  modelling  of  BDBMSs  has  not  been 
very  successful  to  date. 

In  this  chapter  the  transaction  processing  model  described  by  Bernstein 
and  Goodman  [BeGoSO]  will  be  used  to  study  the  behavior  of  concurrency  con¬ 
trol  methods  for  DDBMSs.  In  this  model  the  DDBMS  is  considered  to  be  a  collec¬ 
tion  of  sites  that  communicate  through  a  communication  network.  Each  site  is 
a  computer  containing  one  or  both  of  two  software  modules:  a  transaction 
manager  (TM)  and  a  data  manager  (DM).  The  TMs  control  the  user  interaction 
with  the  database  and  the  DMs  manage  the  actual  database.  Thus,  the  TMs  can 
be  thought  of  as  providing  logical  data  independence  while  the  DMs  provide  phy¬ 
sical  data  independence . 

The  communication  network  is  assumed  to  be  perfectly  reliable  so  that  if  a 
message  is  sent  from  site  A  to  site  B,  site  B  is  guaranteed  to  receive  the  mes¬ 
sage  without  error  and  within  a  finite  amount  of  time.  This  level  of  service  can 
be  guaranteed  by  the  communication  protocols. 

4.2.  Transactions 

Users  interact  with  the  DDBMS  by  running  transactions  (sometimes 
referred  to  as  jobs),  either  on-line  or  batched.  Transactions  come  in  many 
flavors.  They  may  be  queries,  report  generating  programs,  application  pro¬ 
grams,  etc.  However,  in  this  model,  it  is  only  important  to  know  which  copies  of 
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data  items  the  transactions  access  and  update.  A  data  Item  is  a  file,  record,  or 
field.  The  granularity  of  data  items  is  left  unspecified  at  this  point.  Associated 
with  each  transaction  is  a  read  (write)  set  which  is  composed  of  those  data 
items  that  the  transaction  reads  (writes).  The  transaction  is  submitted  at  the 
origination  site  and  may  then  be  trensferred  to  a  site  where  it  will  be  pro¬ 
cessed.  It  is  possible  that  the  processing  may  be  distributed  across  many  sites; 
however  in  this  thesis  it  will  be  assumed  that  each  transaction  is  processed  at  a 
single  site. 

Transactions  are  assumed  to  represent  complete  and  correct  computations 
so  that  if  one  were  run  alone  on  a  database  that  was  consistent,  then  the  data¬ 
base  would  again  be  consistent  after  the  transaction  was  run.  A  consistent 
database  is  one  that  meets  a  number  of  pre-defined  integrity  constraints,  (e.g., 
that  the  number  of  widgets  on  hand  equals  the  number  of  widgets  received 
minus  the  number  of  widgets  delivered). 

A  transaction  in  the  model  is  composed  of  a  BEGIN  followed  by  a  sequence 
of  READS  and  WRITES  followed  by  an  END.  These  are  issued  to  the  TM  which  then 
performs  the  necessary  operation,  either  to  retrieve  (READ)  or  update  (WRITE) 
the  stored  data  items.  Each  transaction  has  a  private  workspace  (created  by  a 
BEGIN  operation  and  kept  at  a  single  site)  where  the  transaction’s  data  is 
stored.  When  the  transaction  issues  a  READ  and  if  the  value  to  be  retrieved  is 
not  already  stored  in  the  transaction’s  workspace,  the  TM  retrieves  the  value 
from  the  database  by  issuing  a  dm-read  to  the  DM  on  behalf  of  the  transaction. 
If  the  data  item  is  stored  locally,  it  will  most  likely  be  retrieved  from  the  local 
database:  however,  it  may  be  retrieved  from  a  remote  site.  When  the  transac¬ 
tion  issues  a  WRITE  and  the  value  to  be  written  is  not  already  stored  in  the 
transaction’s  workspace,  the  TM  creates  an  entry  in  its  private  workspace  for 
the  data  item  and  sets  the  value  of  the  data  item  in  the  workspace  to  what  is  to 
be  written.  If  the  data  item  is  already  in  the  transaction’s  workspace,  then  its 
value  is  updated  there.  When  an  END  is  encountered  by  the  TM,  a  dm-write  is 
issued  for  each  data  item  to  be  updated  and  the  values  in  the  private  workspace 
are  sent  to  all  the  database  sites  to  be  permanently  stored. 

4.3.  Cancurrency  Control 

When  several  users  are  simultaneously  issuing  transactions  it  is  necessary 
to  order  access  to  the  data.  The  mechanisms  to  do  this  eire  called  Concurrency 
Control  Methods  (CCMs);  they  are  necessary  in  a  multi-user  environment  to 
maintain  integrity  constraints,  consistency  between  redundant  copies  of  data 
items,  and  to  control  deadlock,  indefinite  postponement,  and  cyclic  restart. 
CCMs  are  also  necessary  to  guarantee  the  user  that  his  transaction  will  perform 
the  same  computation  as  it  would  in  a  single  user  environment.  Deadlock  will 
be  discussed  subsequently.  Indefinite  postponement  occurs  when  a  particular 
transaction  is  continually  blocked  by  other  jobs  in  such  a  way  that  the  blocked 
job  never  terminates.  Cyclic  restart  is  the  phenomenon  that  a  job  gets  con¬ 
tinually  restarted  as  it  tries  to  make  progress  [ShCo72]. 

Several  mechanisms  can  be  used  to  implement  CCMs.  For  example,  a 
conflicting  transaction  may  be  aborted  and  restarted.  Two  transactions  conflict 
if  the  portion  of  the  database  that  one  of  the  transactions  writes  intersects  with 
the  part  of  the  database  the  other  transaction  reads  or  writes.  When  a  transac¬ 
tion  is  aborted  and  restarted,  the  transaction’s  private  workspace  is  removed 
and  the  transaction  is  restarted  from  the  beginning.  A  transaction  may  be 
aborted  up  until  the  time  at  which  it  begins  to  store  data  permanently  in  the 
database.  (See  section  4.4  for  details.)  Whenever  a  job  is  aborted  by  a  con¬ 
currency  control  mechanism  it  is  then  automatically  restarted. 
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4.4.  Reliability 

If  a  site  fails  after  it  has  written  some  of  a  transaction's  updates  but  before 
it  finishes  all  of  them,  then  the  database  may  be  in  an  inconsistent  state  when 
it  returns  to  normal  operation.  For  this  reason,  a  technique  called  two-phase 
committment  is  used.* 

The  first  phase  of  two-phase  committment  begins  when  a  transaction 
issues  an  END.  The  DM  places  all  the  values  to  be  updated  onto  some  secondary 
storage  device  (e.g.,  disk)  at  each  site  by  Issuing  a  pre-commit  for  each  copy  of 
the  data  items  to  each  of  the  sites  that  store  the  data  item.  If  phase  one  is  not 
completed,  no  permanent  damage  is  done  since  the  database  has  not  been 
altered.  During  phase  two,  update  requests  are  sent  to  each  site  and  the  values 
are  updated  in  the  database.  If  the  DBMS  fails  during  phase  two  it  can  recover  if 
the  database  is  inconsistent  because  the  values  are  retained  in  secondary 
storage.  Before  the  messages  to  start  phase  two  can  be  sent,  the  processing 
site  must  receive  acknowledgments  from  each  relevant  site  in  the  DDBMS  indi¬ 
cating  that  phase  one  has  been  completed  successfully.** 

Two-phase  committment  will  be  included  in  our  model  of  transaction  pro¬ 
cessing  since  it  has  an  impact  on  the  performance  of  concurrency  control 
methods.  It  will  be  assumed,  however,  that  sites  never  go  down,  and  this  thesis 
will  not  address  the  problem  of  recovery  when  a  site  becomes  operable  after  a 
crash.  For  a  discussion  of  some  concurrency  control  problems  encountered 
when  sites  and  communication  links  fail,  see  [EagSl],  The  conclusions  in 
[EagBl]  show  that  extensions  to  CCMs  exist  that  provide  robustness  and  have 
minimi  overhead  when  there  are  no  site  or  communication  link  failures.  It  is 
reasonable  to  assume  that  these  types  of  failures  are  rare,  and  since  the  over¬ 
head  Incurred  in  the  absence  of  failures  Is  low,  conclusions  drawn  about  perfor¬ 
mance  assuming  perfect  reliability  are  still  valid  when  failures  occur  with  low 
frequency. 

4.5.  Phases  of  a  'transaction 

Bernstein  and  Goodman  show  that  most  CCMs  can  be  thought  of  as  the 
solution  to  two  subproblems;  synchronizing  reads  with  writes,  and  synchroniz¬ 
ing  writes  with  writes  [BeGoBO].  They  point  out  that  the  algorithms  that  solve 
these  synchronization  problems  cein  be  combined  in  many  different  ways  to 
form  integrated  CCMs.  They  describe  forty-eight  such  algorithms  and  point  out 
that  variations  of  each  technique  can  lead  to  thousands  of  CCMs.  This  creates 
the  additional  problem  of  having  to  choose  one. 

The  new  framework  presented  in  this  thesis  is  not  intended  to  be  a  way  of 
generating  new  methods,  but  it  is  intended  to  be  a  convenient  way  to  cast  an 
algorithm  so  that  performance  comparisons  with  other  algorithms  can  be 
made.  By  taking  into  account  additional  information  about  the  DDBMS,  the  sys¬ 
tem  designer  may  be  able  to  restrict  his  choice  of  CCM  to  a  small  number  of 
methods. 

As  many  as  four  types  of  database  sites  may  be  associated  with  a  single 
transaction.  The  transaction  is  submitted  by  the  user  at  the  origination  site. 
The  transaction  may  then  be  transferred  to  the  processing  site  where  it  is  to  be 
run.  It  is  assumed  that  all  processing  is  done  at  a  single  site.***  After  that  it 

•Not  all  CCMs  require  two  phase  commitment.  For  example,  in  [GeSe78]  there  is  a  mechan¬ 
ism  for  undoing  completed  transactions.  Because  of  this  capability,  the  consistency  in  the 
database  can  be  maintained  without  requiring;  two-phase  commitment  techniques. 

•♦In  some  algorithms,  it  is  not  necessary  to  hear  from  all  of  the  sites;  e.g.,  in  [Tho79]  only  a 
majority  of  sites  is  required. 

•••Some  systems  allow  a  single  transaction's  processing  to  be  distributed;  however,  these 
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goes  through  a  data  gathering  phase  at  the  data  source  sites.  When  it  has 
finished  its  computation,  it  sends  pre-commits  to  all  the  destination  sites,  the 
sites  where  the  data  items  in  the  transaction’s  write  set  are  stored.  Finally, 
when  the  destination  sites  have  acknowledged  the  pre-commits,  the  update 
request  is  sent,  and  phase  two  of  the  two-phase  commitment  process  may 
begin.  The  phases  of  a  transaction  are  pictured  in  Figure  4,1  along  with  the 
site(s)  that  are  involved. 

In  this  transaction  model,  the  steps  that  will  be  considered  are  those  that 
occur  at  the  processing  site,  the  data  source  sites,  and  the  destination  sites. 
The  problem  of  determining  at  which  site  a  transaction  should  run  has  been 
investigated  in  the  literature  but  will  not  be  considered  here  [BGWTRBl].  The 
assumption  that  the  origination  site  and  the  processing  site  coincide  will  be 
made.  The  sites  carry  out  the  six  operations  listed  in  Figure  4.2,  To  execute 
these  six  operations  there  are  only  ten  interactions  with  other  database  sites 
that  occur.  A  site  may  (l)  request.  (2)  pre-commit,  or  (3)  update  a  data  item 
value.  The  site  may  receive  a  (4)  request  for  the  value  of  a  data  item,  a  (5) 
request  to  pre-commit,  or  a  (6)  request  to  update  a  data  item.  In  addition,  a 
site  may  also  (7)  accept  a  read  and  transfer  the  necessary  data,  (8)  reject  a 
read,  (9)  reject  a  pre-commit,  or  (lO)  acknowledge  that  a  pre-commit  has  been 
received  and  accepted.  These  correspond  to  the  different  types  of  messages 
that  will  be  transmitted  among  the  database  sites  in  the  network  to  carry  out 
the  operations  in  Figure  4.2.  One  factor  in  determining  the  performance  of  a 
concurrency  control  method  is  the  eimount  of  time  that  each  of  these  opera¬ 
tions  requires  times  the  frequency  with  which  they  are  executed. 

To  cast  each  algorithm  in  this  framework,  a  particular  database  site  and  its 
relation  to  the  outside  world  is  considered.  Figure  4.2  outlines  the  possible 
activities  in  which  the  Concurrency  Control  Mechanisms  are  involved.  Veirious 
algorithms  cause  different  delays  for  each  of  the  activities,  as  will  be  seen  later 
on. 

Figure  4.3  shows  the  model  as  a  queueing  network.  This  is  useful  to 
describe  the  processing  in  the  system,  however,  the  techniques  available  to 
solve  traditional  queueing  network  models  are  not  easily  applied  here.  Reasons 
for  this  will  be  presented  later  in  this  section.  A  transaction  starts  at  the  ter¬ 
minals  and  then  cycles  through  the  network.  It  does  its  local  processing  at  the 
CPU  and  devices.  When  it  needs  additional  data  from  another  site,  or  it  needs 
to  update  another  site,  it  changes  class  and  visits  the  other  site  to  do  the  pro¬ 
cessing.  The  primed  classes  represent  requests  from  or  messages  to  the  other 
sites  in  the  network.  A  class  change  to  class  0  models  a  transaction  being 
rejected  and  restarted. 

Figure  4.4  shows  a  two-site  system  where  the  two  databases  are  assumed  to 
be  identical.  In  a  DDBMS  that  is  fully  replicated  it  would  usually  be  foolish  to 
send  a  read  request  to  another  site  when  the  information  can  be  obtained 
locally  without  the  added  overhead  required  to  send  the  message.  The  time 
when  remote  reads  would  be  useful  is  when  a  particular  database  file  at  a  par- 
ticulsa*  site  is  unavcdlable  because  of  a  disk  problem,  or  in  a  non-fully  replicated 
database,  for  example*.  Since,  in  this  example  it  is  assumed  that  there  are  no 
site  failures  and  that  the  database  sites  are  identical,  it  will  also  be  assumed 
that  ail  read  requests  are  processed  locally**  amd  that  only  updates  are 
transmitted  to  the  other  sites. 


will  not  be  considered  here. 

•Another  situation  when  it  might  be  advantageous  to  do  a  non-local  read  is  if  the  local  copy 
is  locked;  however  this  seems  unlikely  to  prove  useful  and  will  not  be  considered  further. 
••Centralized  2PL  is  an  exception  to  this  rule  as  will  be  explained  later  in  this  thesis. 
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Transaction  processing  broken  down  Into  constituent  phases. 
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-  pre-commit  Is  acknowleged 

-  pro-commit  Is  rejected  and  rejection  mes¬ 

sage  Is  sent  over  network 


DM  sends  an  update  request  elsewhere 
OM  receives  an  update  request  from  elsewhere 

-  update  request  Is  sent  over  network 


figure  4.2 


Operations  performed  by  transaction  and  the  possible  Interactions  with 
other  databases  that  results. 
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Hulti-class  queueing  network  portraya'i  of  a  single  site  In 
ODBMS  and  Its  connection  with  other  sites  In  the  network. 
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The  unknown  quantities  needed  to  solve  this  model  are  shown  in  Figure 

4.5.  Also  shown  in  figure  4.5  is  an  indication  of  whether  the  quantity  depends 
on  the  particular  algorithm  being  used  or  whether  it  depends  on  the  transac¬ 
tion  workload.  Several  of  the  algorithm-dependent  parameters  are  not  easily 
obtainable.  For  example,  the  length  of  time  required  to  process  a  pre-commit 
varies  from  algorithm  to  algorithm,  and  the  activity  is  not  captured  well  in  a 
queueing  network  model.  The  delay  is  dependent  on  the  activity  going  on  at  the 
other  site.  A  class  2  job  at  database  site  DB  must,  in  general,  continue  to  block 
other  conflicting  transactions  until  the  pre-commit  is  completed  at  database 
site  DB’.  This  represents  a  form  of  simultaneous  resource  possession,  a 
phenomenon  that  has  traditionally  been  difficult  to  solve  with  analytic  queue¬ 
ing  network  models.  Other  complicating  factors  include  having  to  model  the 
fact  that,  in  some  algorithms,  reads  conflict  only  with  pre-commits  while  pre¬ 
commits  conflict  with  both  reads  and  pre-commits.  The  problem  of  restarting  a 
transaction  (non-conservation  of  work)  and  estimating  the  probability  of 
deadlock  also  makes  the  application  of  traditional  queueing  network  modelling 
in  this  situation  difficult. 

The  problems  mentioned  above  deal  with  the  need  to  model  properties  of 
the  algorithms  to  determine  some  of  the  important  delays  encountered  by  the 
transactions  in  the  system.  Another  difficulty  with  this  two-site  queueing  net¬ 
work  model  is  the  necessity  for  obtaining  accurate  values  for  the  parameters 
that  characterize  the  workload.  The  problem  of  good  workload  characterization 
is  difficult.  However,  despite  this  lack  of  precise  workload  information  and  the 
difficulty  with  capturing  some  of  the  algorithmic  and  scheduling  properties  in 
an  anal^ic  model,  several  useful  comparisons  are  still  possible.  In  Chapter  5  a 
technique  for  comparing  CCMs  will  be  presented. 

4.6.  Description  of  Algorithms 

The  algorithms  to  be  considered  can  be  found  in  [BeGoSO,  Tho79,  LeLa76, 
GeSe78,  and  RSL78].  A  brief  discussion  of  each  of  the  algorithms  is  presented 
followed  by  a  section  where  the  algorithms  are  broken  down  into  the  operations 
described  above.  Complete  descriptions  may  be  found  in  the  referenced  papers. 

A  few  ideas  are  common  to  a  number  of  algorithms  and  deserve  special 
attention.  The  first  is  two-phase  locking.  (This  is  not  to  be  confused  with  two- 
phase  commitment.)  A  method  is  a  two-phase  locking  method  if  it  requires  that 
a  transaction  lock  a  data  item  before  using  it  and  that  a  transaction  not  claim 
any  additional  locks  after  it  releases  one.  The  name  is  appropriate,  since  the 
method  splits  the  transaction  into  two  phases,  agrowing  phase  during  which  all 
locks  are  claimed  and  a  slirinking  phase  during  which  the  locks  are  released.  If 
two-phase  locking  (2PL)  is  enforced,  then  Eswaran  et  al  show  that  inconsisten¬ 
cies  arising  in  the  database  may  not  be  attributed  to  the  concurrency  control 
algorithm  [EGLT76].  Bernstein  and  Goodman  describe  several  ways  to  use  2PL 
for  concurrency  control,  among  them  Basic  SPL  with  Primary  Copy  2PL  for 
write  synchronization,  and  Centralized  2PL  [BeGoBO].  Also,  Rosenkrantz, 
Stearns,  and  Lewis  incorporate  the  SPL  restriction  in  their  algorithms  [RSL78]. 
"When  2PL  is  used  there  is  a  possibility  that  a  transaction  may  fail  to  terminate, 
because  of  deadlock  or  cychc  restart,  for  example. 

The  simplest  form  of  deadlock  occurs  when  a  transaction  has  claimed  some 
resources  and  is  waiting  for  ein  additional  resource  that  is  held  by  another  tran¬ 
saction  that  is  in  turn  waiting  for  a  resource  held  by  the  first  transaction.  More 

complicated  situations  can  arise  when  a  cycle  of  transactions  T(l) . T(N)  forms 

such  that  each  transaction  T(I)  is  waiting  for  T(I+l)  for  1  =  1 N-1  and  T(N)  is 

waiting  for  T(l).  In  order  for  any  transaction  to  make  further  progress  in  this 
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situation  some  transaction  must  be  aborted  and  restarted.  Information  about 
which  transactions  are  active  and  which  locks  are  held  must  be  available. 
Detection  and  prevention  of  deadlocks  in  a  distributed  environment  is  even 
more  difficult  than  in  a  centralized  system  because  the  information  about 
which  transactions  are  waiting  for  other  transactions  must  be  periodically  com¬ 
municated  among  the  database  sites.* 

Deadlock  can  be  handled  in  many  ways.  Preventive  deadlock  techniques 
involve  examining  the  transaction  to  determine  if  a  data  request  by  the  tran¬ 
saction  could  produce  a  deadlock.  If  it  could,  the  transaction  is  restarted  or 
delayed;  otherwise,  it  is  executed.  Deadlock  detection  techniques  periodically 
examine  the  transactions  that  are  running  to  see  if  a  deadlock  is  present.  If 
there  is  a  deadlock,  then  some  transaction  is  chosen  to  be  restarted.  One 
"nice"  property  of  deadlocks  is  that  once  they  are  present  they  remain  forever 
if  nothing  is  done  to  remove  them.  For  this  reason,  if  deadlocks  are  infrequent 
(as  they  are  thought  to  be),  then  the  deadlock  detection  mechanism  may  be 
invoked  infrequently,  and  would  not  cause  much  overhead.  The  tradeoff  is  that 
those  transactions  that  are  caught  in  a  deadlock  will  be  delayed  abnormally 
long  if  the  detector  is  invoked  too  infrequently. 

Another  technique  that  can  be  used  to  preserve  consistency  in  a  DDBMS  is 
that  of  timestamp  ordering  (T/0).  A  linear  ordering  of  operations  is  made  (e.g., 
according  to  the  relative  time  of  issue  at  each  site  or  a  set  of  sequence 
numbers)  and  each  site  is  required  to  perform  conflicting  operations  in  order 
according  to  the  timestamp.  Some  algorithms  (Bernstein  and  Goodman  Basic 
T/0  with  the  Thomas  Write  Rule,  Bernstein  and  Goodman  Multi-Version  T/0, 
Gelenbe  and  Sevcik,  and  the  Rosenkrantz,  Stearns,  and  Lewis’  algorithms)  allow 
the  operations  to  proceed  out  of  order  as  long  as  a  conflict  does  not  occur.  A 
conflict  is  possible  between  two  transactions  A  and  Bit  their  read  and  write  sets 
intersect.  A  conflict  between  A  and  Fis  said  to  occur  if.  for  example,  A  is  run¬ 
ning  at  a  site  and  B  arrives  and  tries  to  run,  or  vice  versa.  If  a  conflict  occurs 
then  the  conflicting  requests  must  be  processed  in  timestamp  order,  possibly 
necessitating  a  rejection  of  the  operation  or  a  transaction  restart.  Other  algo¬ 
rithms  (Bernstein  and  Goodman  Conservative  T/0,  Le  Lann)  require  a  strict 
ordering  on  the  events  so  that  no  conflicts  can  occur  and  no  restarts  are  neces¬ 
sary.  (See  Pessimistic  vs.  Optimistic  Algorithms  section  below.) 

Timestamps  can  be  assigned  to  data  items  and/or  transactions  and/or 
individual  transaction  operations.  When  a  read  (write)  timestamp  is  assigned  to 
a  data  item,  it  indicates  the  time  of  the  last  operation  that  read  (wrote)  the 
data  item.  A  timestamp  that  is  assigned  to  an  operation  indicates  the  time  of 
Initiation  of  the  operation.  A  timestamp  is  generated  by  appending  the  site 
number  to  the  clock  time  so  that  if  two  timestamps  have  the  same  clock  time 
they  can  be  ordered  by  their  site  number.  The  assumption  is  made  that  the 
clock  granularity  is  sufficiently  fine  that  a  site  cannot  issue  two  requests  in  one 
clock  tick  [Lam78]. 

4.6.1.  Bernstein  and  Goodman  -  Basic  2PL  with  Primary  Copy  2PL  for  write  syn¬ 
chronization 

In  Basic  2PL,  transactions  submit  dm-reads  and  pre-commits  in  accor¬ 
dance  with  the  2PL  rules  described  above,  These  operations  implicitly  request 
read  or  write  locks  on  the  data  items  to  be  read  or  written.  If  the  requested 
lock  cannot  be  granted  then  the  transaction  is  placed  in  a  queue  for  that  data 


•This  is  not  true  for  some  algorithms,  e.g..  Centralized  2PL  [BeGoBO],  where  all  waits-for  in¬ 
formation  is  kept  at  one  central  site. 
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item.  As  a  rGSiill,  deadlock  Is  posslbh'  and  a  LrnnM<iidiaii  tnay  hnv»>  lo  liw  raw' 
Idrled.  (Sec  discussion  of  deadlock  above.)  Note  that  a  IransacLion  need  only 
obtain  a  read  lock  on  the  copy  of  the  data  item  it  actually  reads,  while  to  update 
a  data  item  requires  write  locks  on  all  copies  of  the  data  item.  The  locks  are 
released  when  a  site  processes  an  update  request  from  a  transaction  that  has 
received  acknowledgements  for  all  its  pre-commits  (or  for  queries,  when  an 
END  is  processed). 

As  a  modification  to  the  Basic  2PL  algorithm,  one  copy  of  each  data  item 
can  be  designated  the  "primary  copy".  Then,  before  a  transaction  updates  a 
copy  of  the  data  item  it  must  obtain  a  lock  on  the  primary  copy  of  the  data  item 
instead  of  all  the  copies  as  in  Basic  2PL.  The  primary  copy  site  then  acquires 
locks  at  all  the  other  database  sites  on  the  required  data  items.  By  having  the 
transactions  claim  a  lock  on  the  primary  copy  first,  some  deadlock  situations 
that  would  have  occurred  using  Basic  2PL  may  be  avoided  and  the  number  of 
restarts  that  are  necessary  will  be  decreased. 

For  example,  consider  three  sites  A,  B,  and  C  with  transactions  T(l)  and 
T(2).  Suppose  both  transactions  want  to  write  data  item  x  at  all  three  sites. 
Using  Basic  2PL,  T(l)  might  try  to  lock  data  item  x  at  site  A,  then  B.  then  C. 
T(2),  at  the  same  time  tries  to  lock  data  item  x  at  site  C,  then  B,  then  A.  Sup¬ 
pose  that  T(l)  gets  the  lock  at  A  and  5 and  T(2)  gets  the  lock  at  C.  This  causes  a 
deadlock  and  either  T(l)  or  T(2)  must  be  restarted. 

If  Primary  Copy  2PL  were  used  and  the  primary  copy  of  data  item  x  was  at 
site  A  then  both  T(l)  and  T(2)  would  be  forced  to  request  the  lock  at  site  A  first. 
Deadlock  would  not  have  occurred  since  only  one  of  T(l)  or  T(2)  would  have 
been  allowed  to  proceed.  Deadlocks  can  still  occur  with  Primary  Copy  2PL,  but 
will  involve  two  or  more  data  items.  The  modified  algorithm  is  referred  to  as 
Basic  2PLwith  Primary  Copy  2PL  for  write  synchronization. 

4.6.2.  Bernstein  and  Goodman  -  Centralized  2PL 

In  Centralized  2PL  all  the  lock  information  is  kept  at  one  central  site.  The 
two-phase  locking  rule  holds  and  deadlock  must  be  prevented  or  controlled. 
Deadlock  prevention  or  detection  is  much  simpler  for  Centralized  2PL  than  for 
most  of  the  other  distributed  2PL  techniques  since  all  the  lock  information  is 
local  to  the  central  site.  In  addition,  with  a  Centralized  2PL  algorithm  the  lock 
releases  can  all  be  sent  in  a  single  message  to  the  one  central  site.  Obviously, 
with  Centredized  2PL,  it  would  be  desirable  for  the  centred  site  to  keep  a  copy  of 
each  data  item.  As  well,  all  reads  should  be  done  at  the  central  site  since  they 
will  have  to  request  their  read  locks  there. 

4.6.3.  Bernstein  and  Goodnmn  -  Basic  T/0  with  the  Thomas  Write  Rule 

Basic  T/0  is  a  timestamp-based  synchronization  technique.  Every  read 
request,  pre-commit,  and  update  request  is  tagged  with  a  timestamp  indicating 
the  time  of  submission  for  the  transaction  and  the  site  number  where  the 
request  originated.  Each  database  keeps  track  of  the  largest  (most  recent) 
timestamp  of  each  read  and  pre-commit  for  each  data  item.  These  are  referred 
to  as  the  read  timestamp  and  write  timestamp,  respectively.  The  scheduler 
then  compares  the  timestamp  of  the  transaction  with  the  data  item  timestamp. 
If  the  request  is  a  read  and  the  write  timestamp  is  larger  than  that  of  the  tran¬ 
saction  or  if  the  request  is  a  pre-commit  and  the  read  or  write  timestamp  is 
larger  than  that  of  the  transaction,  then  the  request  is  rejected.  Otherwise,  the 
operation  is  performed.  If  the  request  is  a  read  then  the  read  timestamp  is 
updated  while  if  the  request  is  a  pre-commit  then  the  treinsaction  writes  the 
data  items  to  secondary  storage,  sends  a  pre-commit  acknowledgement  to  the 
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processing  site  and  wedts  for  the  associated  update  request.  During  the  wait  all 
incoming  read  requests  are  queued  until  the  corresponding  update  request 
arrives.  Each  of  the  queued  requests  is  then  processed  as  if  it  had  just  arrived 
at  the  site. 

A  rejected  transaction  can  be  restarted  with  a  larger  timestamp.  By  choos¬ 
ing  a  new  timestamp  that  is  quite  a  bit  larger,  it  is  more  likely  that  the  same 
two  transactions  will  not  conflict  again.  However,  a  greater  number  of  other 
transactions  may  need  to  be  restarted  as  a  result  of  allowing  the  rejected  tran¬ 
saction  to  have  an  artificially  large  timestamp. 

The  Thomas  Write  Rule  is  only  useful  in  synchronizing  pre-commits;  how¬ 
ever  it  can  and  should  be  used  in  conjunction  with  other  algorithms  [Tho79].  If 
a  pre-commit  has  a  smaller  timestamp  than  the  write  timestamp  of  the  data 
item  then  the  Thomas  Write  Rule  says  the  request  may  be  ignored  instead  of 
rejected  as  in  Basic  T/0.  This  is  obviously  an  improvement  to  Basic  T/0  and 
there  is  no  reason  for  Basic  T/0  to  be  implemented  without  the  Thomas  Write 
Rule.  The  improved  algorithm  is  referred  to  as  Basic  T/0  with  the  Thomas  Write 
Rule. 

4.6.4.  Bernstein  and  Goodman  -  Multi-Version  T/0 

In  Multi-Version  T/0,  each  site  keeps  track  of  a  set  of  read  timestamps  and 
a  set  of  write  timestamps  along  with  the  value  that  was  written  or  read  at  the 
time.  Read  requests  are  never  rejected  since  they  can  always  read  the  correct 
version  of  the  data  item,  i.e.,  the  value  that  was  written  with  the  largest  time- 
stamp  less  than  the  timestamp  of  the  read  request.  Writes  can  be  rejected, 
however,  if  they  are  about  to  write  a  data  item  that  a  completed  read  request 
has  already  read.  In  other  words,  suppose  a  data  item  has  a  write  timestamp  T 
and  a  read  timestamp  T+2.  If  a  write  request  arrives  with  a  timestamp  of  T-i-1, 
then  it  must  be  rejected  because  the  read  timestamp  T+2  was  already  pro¬ 
cessed. 

The  record  of  all  the  timestamps  and  values  consumes  a  lot  of  storage,  but 
this  requirement  can  be  decreased  by  letting  a  single  timestamp  represent  a 
collection  of  data  items.  To  alleviate  the  storage  requirement  further,  the  old 
timestamp-value  pairs  should  be  forgotten.  If  the  requests  from  each  site 
arrive  in  the  order  they  were  sent,  then  one  possible  way  to  know  which 
timestamp-value  pairs  are  old  and  are  candidates  to  be  forgotten  is  to  keep 
track  of  when  a  read  request  has  been  received  from  all  sites  since  a  particular 
time.  After  each  of  these  times,  the  site  will  never  receive  a  read  request  for 
values  of  data  items  from  before  the  time  and  the  old  values  may  be  forgotten. 
If  sites  that  are  not  too  active  send  null  operation  requests,  then  the  number  of 
versions  that  need  to  be  kept  around  could  be  kept  to  a  minimum.  These  null 
operation  requests  merely  provide  synchronization  information.  They  need  not 
read  or  write  information  to  the  database.  If  the  requests  are  not  guaranteed  to 
arrive  in  order,  then  messages  can  be  numbered  so  that  missing  ones  can  be 
detected. 

4.6.5.  Bernstein  and  Goodman  -  Conservative  T/0 

Conservative  T/0  is  different  from  the  other  T/0  techniques  described  so 
far  because  the  timestamps  are  assigned  only  to  the  operations  and  not  to  the 
data  items.  Also,  in  Conservative  T/0,  transactions  are  never  restarted  (or 
rejected).  Each  site  maintains  a  pair  of  queues  for  each  other  site  in  the  net¬ 
work:  one  for  read  requests  and  one  for  pre-commits.  Operations  from  each 
other  site  are  assumed  to  be  received  in  timestamp  order,  and  then  put  into 
either  the  read  queue  or  the  pre-commit  queue  associated  with  the  site.  Thus, 
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if  a  pre-commit  queue  for  a  peirticalar  site  has  a  request  with  timestamp  T  euid 
the  read  queue  from  that  same  site  is  empty,  then  that  implies  that  the  site  did 
not  intend  to  send  any  read  requests  before  time  T. 

The  processing  of  operations  continues  as  follows  at  each  site.  The  site 
scheduler  scans  its  queues  for  the  operation  with  the  lowest  timestamp.  It  then 
processes  that  operation  as  long  as  at  least  one  queue  in  each  pair  is  not 
empty.  If  both  queues  are  empty,  then  the  associated  site  may  have  an  older 
transaction  that  wants  to  send  a  read  request  or  a  pre-commit  and  the 
scheduler  must  wait  to  hear  from  that  site.  The  scheduler  may  request  a  site 
to  acknowledge  that  it  is  still  functioning,  or  alternatively,  each  site  may 
p>eriodically  send  null  operation  requests  that  indicate  that  the  site  is  current 
up  to  the  time  of  the  operation.  As  in  Multi-Version  T/0,  the  null  operation 
requests  are  used  only  for  synchronization  purposes  and  need  not  read  or  write 
the  database. 

The  main  advantage  of  Conservative  T/0  is  that  transactions  are  never  res¬ 
tarted.  The  disadvantage  is  that  each  site  can  only  run  a  transaction  when 
there  is  a  request  from  each  other  site  in  its  queue.  The  degree  of  paralellism 
at  a  site  using  Conservative  T/0  is  limited  by  the  length  of  the  shortest  queue 
at  the  site, 

4.6.6.  Gelenbe  and  Sevcik  -  Aggressive  T/0 

Gelenbe  and  Sevcik  proposed  a  technique  (for  fully  redundemt  DDBMS's) 
that  is  characterized  by  two  policies;  an  ordering  policy  and  a  release  policy. 
The  ordering  policy  specifies  the  order  in  which  the  updates  are  to  be  applied 
and  the  release  policy  specifies  when  an  update  that  is  received  at  a  site  should 
be  applied  to  the  database.  The  ordering  policy  is  that  the  updates  are  to  be 
applied  in  timesteimp  order.  The  release  policy  is  to  "release  the  update  the 
maximum  of  either  R  time  units  after  origination  or  the  time  of  arrival  of  the 
request  at  the  site".  If  some  update  requests  conflict,  then  the  conflicting 
requests  with  the  later  timesteimps  must  be  rejected.  If  the  rejected  update 
request  has  already  been  applied  it  must  be  undone  All  the  updates  that  read 
the  results  of  the  undone  updates  must  also  be  undone.  This  can  have  a  cas¬ 
cading  or  domino  effect  causing  a  number  of  restarts  to  occur.  In  this  algo¬ 
rithm,  a  transaction  cannot  cause  a  transaction  at  another  site  to  be  rejected. 
If  an  update  request  is  rejected,  then  the  transaction  must  be  restarted.  By 
adjusting  the  value  of  R,  several  different  behaviors  are  attainable.  For  exam¬ 
ple,  when  R  is  large,  the  probability  of  restart ’s  low  but  the  delay  is  large.  When 
R  IS  small,  however,  the  delay  may  be  small  but  the  frequency  of  restarts  is 
high.  Since  a  large  value  of  R  produces  an  algorithm  that  is  similar  to  Conser¬ 
vative  T/0,  a  small  value  of  R  will  be  assumed  whenever  Aggressive  T/0  is  men¬ 
tioned. 

4.8.7.  Le  Lana  -  Tickets 

In  the  Le  Lann  ticketing  algorithm  the  DDBMS  is  viewed  as  a  virtual  ring  of 
processors.  There  is  a  token  that  is  passed  around  the  ring  that  distributes 
tickets  to  the  sites.  Transactions  issue  requests  to  retrieve  data  items  (read 
requests)  and  to  update  data  items  (write  requests).  To  submit  a  read  or  write 
request  a  site  must  assign  a  ticket  number  to  the  request.  If  there  are  no  tick¬ 
ets  available  at  the  site  when  the  site  wants  to  send  a  request  then  the  request 
must  wait  until  the  token  comes  around  again.  Before  a  request  is  processed  at 
a  site,  all  requests  with  ticket  numbers  less  than  the  received  request  must  be 
received  and  checked  to  see  if  they  conflict.  If  they  conflict,  then  lower  num¬ 
bered  requests  are  executed  first.  This  means  that  if  the  token  completes  a 
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revolution  and  there  are  unused  tickets  at  the  site  then  the  unused  tickets 
must  be  sent  immediately  with  null  requests. 

4.6.8.  Thomas  -  Majority  Consensus 

Thomas'  algorithm  is  also  called  the  Majority  Consensus  algorithm.  It  has 
only  been  discussed  in  the  context  of  a  fully  redundant  DDBMS.  In  this  algo¬ 
rithm,  the  transaction  accesses  the  local  database  for  all  the  data  items  to  be 
used  in  its  update  computation.  These  data  items  are  called  the  base  variables. 
It  also  keeps  track  of  the  timestamps  of  these  data  items.  The  transaction  com¬ 
putes  the  values  of  the  data  items  to  be  updated  and  then  sends  update 
requests  to  each  of  the  sites.  In  order  for  an  update  to  be  performed  it  is 
necessary  for  the  transaction  to  get  a  majority  of  sites  to  accept  the  update. 
The  voting  rule  guarantees  that  a  site  will  not  vote  to  accept  for  both  of  two 
conflicting  transactions.  Hence,  it  is  not  possible  for  both  of  two  conflicting 
transactions  to  obtadn  a  majority  of  votes  since  the  intersection  of  two  majori¬ 
ties  has  at  least  one  site  in  common,  which  would  imply  that  a  site  voted  to 
accept  each  of  two  conflicting  transactions. 

There  are  three  kinds  of  votes  that  a  site  can  make  when  it  is  asked  to  vote 
on  an  update  request;  accept,  reject,  or  pass.  To  decide  how  to  vote,  the  site 
compares  the  timestamps  of  the  readset  variables  with  the  corresponding  time- 
stamps  in  its  database.  If  the  request  conflicts  with  a  request  that  has  already 
obtained  a  majority,  the  site  must  vote  reject.  If  the  request  conflicts  with  a 
younger  request  that  has  not  yet  attained  a  majority  but  which  the  site  has 
voted  on  (a  pending  request)  then  the  site  must  vote  pass.  If  the  request  does 
not  conflict  with  any  other  request  at  the  site  then  the  site  votes  accept. 

Voting  may  also  be  "deferred"  if  the  request  uses  values  that  have  not  yet 
been  made  current  by  some  other  transaction  at  the  particular  site  where  the 
update  request  is  being  processed  or  if  the  request  conflicts  with  a  request  that 
is  older,  but  not  yet  terminated.  Note  that  the  reason  that  voting  is  deferred  if 
the  request  conflicts  with  ein  older  one  is  that,  if  the  older  request  is  restarted 
because  it  did  not  attain  a  majority,  then  an  accept  vote  can  still  be  given  to 
the  deferred  request.  If  the  conflicting  request  is  younger,  a  pass  vote  must  be 
cast  to  avoid  a  possible  deadlock.  Since  older  requests  are  never  deferred  for 
younger  ones,  deadlocks  cannot  occur. 

When  a  majority  is  attained,  then  each  site  is  notified.  The  update  is 
applied  to  the  database  and  all  the  votes  that  were  deferred  due  to  this  update 
are  changed  to  reject.  If  a  majority  is  no  longer  possible  then  each  site  is 
notified  that  the  transaction  was  rejected,  and  all  the  conflicting  requests  that 
were  deferred  are  reconsidered.  This  algorithm  is  robust  in  the  face  of  site 
failure  as  well,  since  only  a  majority  of  sites  at  which  that  data  item  is  stored 
need  to  be  functioning  to  accept  an  update. 

4.0.9.  Rosenkrantz,  Steaums  and  Lewis  -  Wait  or  Die  and  Wound  or  Wait 

Rosenkrantz,  Stearns,  and  Lewis  propose  two  methods.  Wait  or  Die,  and 
Wound  or  Wait.  These  methods  use  the  two-phase  locking  rule  and  also  time- 
stamp  the  requests. 

In  the  Wait  or  Die  method,  if  a  transaction  makes  a  request  and  there  is  a 
conflict,  then  if  the  requesting  transaction  is  older,  the  requestor  waits  for  the 
conflicting  transactions  to  finish.  Otherwise,  the  requestor  is  restarted. 
Deadlock  never  occurs  because  there  is  a  strict  order  among  the  transactions 
and  younger  transactions  never  wait  for  older  transactions  Among  a  finite 
number  of  processes,  the  youngest  will  not  wait  for  any  other  transaction. 
Therefore,  all  processes  will  eventually  terminate. 
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In  the  Wound  or  Wait  method,  if  a  transaction  makes  a  request  and  there  is 
a  conflict,  then  if  the  requesting  transaction  is  older  eind  the  conflicting  tran¬ 
sactions  have  not  sent  out  their  update  requests,  then  the  conflicting  transac¬ 
tions  are  restarted.  Otherwise,  the  requestor  wcdts  for  the  conflicting  transac¬ 
tions  to  flnish  Deadlock  does  not  occur  in  this  algorithm  either.  The  proof  is 
given  in  [RSL7B]. 

4.7.  Pessimistic  vs.  Optimistic  Alg;orithms 

An  algorithm  will  be  called  optimistic  if  its  performeuice  relative  to  other 
algorithms  is  better  when  there  are  few  conflicts  than  when  there  are  many 
conflicts.  A  pessimistic  algorithm  is  one  that  assumes  there  are  many  conflicts 
and  is  designed  to  work  best  in  that  situation.  Kung  and  Robinson  applied  the 
idea  of  optimistic  and  pessimistic  algorithms  to  concurrency  control  methods 
in  an  attempt  to  And  alternatives  to  locking  algorithms  [KuRoBl].  They  claimed 
that  locking  algorithms  were  pessimistic  in  gener2il  and  that  certain  timestamp 
ordering  methods  were  optimistic.  An  example  of  a  pessimistic  algorithm 
presented  here  would  be  Conservative  T/0.  In  Conservative  T/0,  the  read 
requests  and  pre-commits  are  applied  in  exact  timestamp  order.  The  algorithm 
treats  all  transactions  as  if  they  conflict  with  every  other  transaction.  Its  per¬ 
formance  is  just  as  good  when  there  are  lots  of  conflicts  as  when  there  only  a 
few  conflicts.  On  the  opposite  end  of  this  scale  lies  an  algorithm  such  as 
Aggressive  T/0  (when  used  with  a  small  value  of  R).  In  Aggressive  T/0,  the 
updates  are  applied  as  soon  tis  they  arrive  at  a  site  unless  a  conflict  occurs.  If  a 
conflict  occurs,  then  the  conflicting  requests  with  the  later  timestamps  are 
rejected.  Obviously,  if  there  are  many  conflicts,  this  algorithm  will  perform 
badly.  This  is  an  optimistic  algorithm  since,  relatively  speaking,  it  works  far 
better  when  there  are  only  a  few  conflicts  than  when  there  are  many  conflicts. 

The  Le  Lann  algorithm  is  another  example  of  a  pessimistic  algorithm.  The 
ticketing  mechanism  generates  an  ordering  on  the  transactions  and  the  sites 
process  the  transactions  in  exactly  that  order,  making  the  implicit  assumption 
that  all  transactions  conflict. 

Bernstein  and  Goodman's  Basic  T/0  with  the  Thomas  Write  Rule,  Thomas 
Majority  Consensus,  and  the  Rosenkrantz,  Steairns  and  Lewis'  algorithms  pro¬ 
vide  good  examples  of  optimistic  algorithms,  though  possibly  not  as  optimistic 
as  Aggressive  T/0.  The  requests  are  allowed  to  go  uninterrupted  until  a  conflict 
is  found.  When  a  conflict  is  found,  some  transactions  are  restarted  or  rejected. 
The  more  conflicts  there  are,  the  worse  these  cilgorithms  will  perform.  Central¬ 
ized  2PL  algorithm  is  neither  strongly  optimistic  nor  pessimistic.  Multi-Version 
T/0  can  be  considered  pessimistic  since  the  reason  it  keeps  the  old  values  of 
data  items  around  is  that  it  assumes  the  database  is  changing  fast  and  there 
will  be  a  lot  of  conflicts. 

4.8.  Breakdown  of  Each  Algorithm  into  Operations 

Each  of  the  operations  in  Figure  4.2  will  be  considered  separately  for  each 
algorithm.  The  actual  step-by-step  breakdown  for  each  algorithm  is  given  in 
Appendix  B.  Subsequently,  the  performance  of  the  algorithms  will  be  investi¬ 
gated. 
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4.8.1.  DM  sends  a  read  request  elsewhere 

In  a  fully  redundant  DDBMS,  it  would  be  unusual  for  a  transaction  to  send  a 
read  request  elsewhere.  However,  if  the  DDBMS  is  partially  redundant,  then  the 
required  data  item  may  not  reside  at  the  processing  site  and  a  remote  read 
request  would  be  necessary.  Both  the  Aggressive  T/0  algorithm,  and  the  Major¬ 
ity  Consensus  algorithm  have  only  been  discussed  in  the  context  of  a  fully  repli¬ 
cated  database  system. 

A  DM  sends  the  read  request  to  the  site  from  which  data  is  to  be  read  (the 
data  source  site)  and  then  waits  for  a  reply.  If  the  read  request  is  rejected,  the 
transaction  is  resteirted.  A  read  request  may  be  rejected  because  of  site  failure, 
deadlock  prevention  or  detection,  or  in  some  algorithms,  it  is  rejected  if  it 
arrived  too  late  and  the  value  that  the  transaction  should  have  read  has  already 
been  overwritten.  Conservative  T/0  has  one  other  embellishment.  If  no  read 
request  or  pre-commit  is  ready  within  a  certain  pre-deflned  amount  of  time,  a 
null  operation  request  is  sent  to  indicate  that  all  requests  up  until  that  time 
have  been  sent. 

4.6.2.  DM  receives  a  read  request  from  elsewhere 

When  a  data  source  site  receives  a  read  request  from  another  site  it  must 
either  retrieve  the  data  from  the  database  and  send  it  to  the  processing  site  or 
reject  the  request.  The  rejection  can  occur  as  a  result  of  deadlock  prevention 
or  detection  (for  2PL  techniques)  or  if  the  data  is  out  of  date  (for  timestamp 
ordered  techniques).  Note  that  read  requests  are  never  rejected  in  Multi- 
Version  T/0  since  the  history  of  the  data  items  as  they  evolve  is  kept,  and  the 
correct  version  can  always  be  read.  Conservative  T/0  and  the  Le  Lann  algo¬ 
rithm  are  more  complicated  In  these  algorithms,  reads  can  never  be  rejected 
because  the  operations  au-e  all  performed  in  a  fixed  order  that  is  assigned  a 
priori,  l.e.,  timestamps  in  Conservative  T/0  and  ticket  numbers  for  Le  Lann. 

If  the  request  is  not  rejected,  then  the  2PL  techniques  place  a  read  lock  on 
the  requested  data  items  and  most  timestamp-ordered  methods*  update  the 
read  timestamp  of  the  data  items  read. 

4.8.3.  DM  sends  a  pre-commit  elsewhere 

Pre-commits  are  necessary  to  insure  that  the  updates  to  the  stored  copy  of 
the  database  for  a  transaction  are  either  all  completed  or  not  initiated.  The 
mechanism  used  to  guarantee  this  is  to  first  send  a  pre-commit  to  all  the  desti¬ 
nation  sites  that  indicates  that  a  commit  (or  update  request)  follows  and  to 
inform  the  destination  site  to  lock  those  data  items  that  are  pre-committed 
until  the  associated  update  requests  arrive.  The  pre-commit  also  informs  the 
destination  sites  to  write  the  information  necessary  to  perform  the  updates  on 
non-volatile  secondary  storage.  Then,  in  case  of  site  failure  the  information 
will  still  be  accessible  to  the  site  when  it  returns  to  normal. 

When  a  DM  sends  a  pre-commit  elsewhere,  it  includes  all  the  information 
about  which  data  items  to  update  and  the  values  to  be  assigned.  These  mes¬ 
sages  must  be  sent  to  each  of  the  destination  sites.  For  Basic  2PL  mth  Primary 
Copy  2PL  for  write  synchronization,  the  pre-commits  are  first  sent  to  the  pri¬ 
mary  copy  site.  The  primary  copy  site  then  forwards  the  pre-commit  messages 
to  each  of  the  other  destination  sites  in  the  network.  In  Centralized  2PL.  the 
pre-commits  are  first  sent  to  the  central  site  and  then  forwarded  to  the  other 


*In  Conservative  T/0  and  Tickets  it  is  only  necessary  to  remember  the  time  or  ticket 
number  of  the  last  transaction  processed. 
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destination  sites. 

In  the  other  algorithms,  the  processing  site  sends  pre-commits  to  all  of  the 
destination  sites  and  then  waits  for  replies  from  all  of  them  so  that  the  process¬ 
ing  site  knows  that  the  data  items  have  all  been  written  to  secondary  storage. 
If  any  of  the  sites  rejected  the  pre-commit,  then  all  the  other  sites  to  which  the 
pre-commit  was  sent  must  be  informed,  since  they  are  waiting  for  the  associ¬ 
ated  update  which  will  not  be  forthcoming  as  a  result  of  the  rejection.  A  desti¬ 
nation  site  may  reject  a  pre-commit  because  of  deadlock,  site  failure  (possibly 
detected  by  a  timeout),  or  in  some  algorithms,  if  a  later  transaction  already 
read  a  data  item  that  is  about  to  be  updated. 

In  Majority  Consensus,  since  only  a  majority  of  sites  is  needed  to  accept  an 
update,  the  pre-commits  can  be  daisy  chained  from  site  to  site  in  an  attempt  to 
obtadn  a  majority.  When  a  majority  of  sites  vote  to  accept  the  pre-commit,  the 
processing  site  is  informed.  Similarly  the  processing  site  is  informed  if  a  major¬ 
ity  is  no  longer  attainable,  e  g.,  if  a  majority  vote  not  to  accept  the  pre-commit. 

4.8.4.  DM  receives  a  pre-commit  from  elsewhere 

In  Basic  2PL  with  Primary  Copy  2PL  for  write  synchronization  and  Central¬ 
ized  2PL,  the  primary  copy  site  and  central  site  are  special,  respectively.  These 
sites  forward  the  pre-commits  to  the  other  destination  sites.  Yfhen.  all  the  des¬ 
tination  sites  acknowledge  successful  completion,  the  primary  copy  or  central 
site  sends  an  acknowledgement  to  the  processing  site.  If  there  is  a  rejection, 
then  these  sites  send  a  rejection  message  to  the  processing  site.  Possible  rea¬ 
sons  for  rejection  are  given  in  the  preceding  section. 

When  a  DM  receives  a  pre-commit  from  elsewhere  it  writes  the  data  items 
to  non-volatile  secondairy  storage.  If  this  is  successful  (no  site  failures, 
conflict,  or  deadlock),  then  the  pre-commit  is  accepted  and  a  message  is  sent  to 
the  processing  site. 

Pre-commits  are  not  necessary  in  Aggressive  T/0  because  a  transaction 
can  be  undone.  In  this  way,  it  is  possible  to  insure  that  no  transactions  are  left 
only  partially  completed. 

4.8.5.  DM  sends  an  update  request  elsewhere 

After  the  pre-commits  have  been  accepted  at  all  the  destination  sites  the 
associated  update  requests  may  be  sent.  This  update  request  informs  the  desti¬ 
nation  sites  that  they  may  then  copy  the  data  items  from  secondary  storage  to 
the  database  to  be  permanently  stored.  At  the  seime  time,  read  locks  can  be 
released  at  einy  sites  that  were  locked  for  read  but  not  write.  If  the  database  is 
fully  replicated,  then  this  will  not  be  necessary  except  in  the  case  of  queries. 

4.8.6.  DM  receives  an  update  request  from  elsewhere 

When  a  DM  receives  an  update  request  it  copies  the  information  from 
secondary  storage  (written  there  by  the  associated  pre-commit)  to  the  database 
to  be  permanently  stored.  Then,  any  locks  that  were  held  for  this  transaction 
are  released  and  any  write  timesteimps  are  updated  if  necessary. 

A  detailed  cookbook-style  breakdown  of  each  algorithm  is  presented  in 
Appendix  B. 
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Chapter  Five 

Comparison  of  Several  Distributed 
Concurrency  Control  Mechauiisnis 


In  the  previous  chapter,  a  framework  for  describing  distributed  con¬ 
currency  control  algorithms  was  presented.  Several  representative  algorithms 
were  then  cast  into  this  framework  (see  Appendix  B).  Now  the  performance  of 
several  of  these  algorithms  will  be  investigated.  The  technique  to  be  used  in 
this  investigation  proceeds  by  first  determining  those  factors  that  afTect  the 
performance  of  a  distributed  CCM  (CCM-determining  factors).  These  factors 
include  processor,  network,  and  transaction  characteristics.  After  the  CCM- 
determining  factors  have  been  outlined  and  described,  the  goal  will  be  to  deter¬ 
mine  the  impact  of  each  factor  on  each  CCM.  These  results  will  prove  to  be  use¬ 
ful  to  a  system  einalyst  in  helping  to  identify  CCMs  that  would  perform  badly  and 
enable  him  to  consider  only  CCMs  that  can  take  advantage  of  the  distinguishing 
syistem  characteristics  at  hand.  Finally,  to  show  the  usefulness  of  the  tech¬ 
nique.  a  number  of  example  systems  will  be  analyzed  to  produce  a  recommen¬ 
dation  for  a  CCM  that  should  be  used  for  each  system. 

Figure  5.1  lists  some  of  the  factors  that  may  be  useful  in  determining 
which  CCM  a  DDBMS  should  use.  In  the  next  sections,  the  factors  will  be 
analyzed  separately  and  in  Tables  5.1  and  5.2  the  results  of  the  analysis  will  be 
summarized.  Table  5.1  shows  the  system  characteristics  and  Table  5.2  shows 
the  transaction  characteristics.  The  tables  show  which  factors  are  detrimental 
and  which  are  advantageous  to  each  algorithm  and  to  some  extent  the  degree 
to  which  the  algorithm  is  affected.  The  objective  then  is  to  use  a  characteriza¬ 
tion  of  the  DDBMS  and  Tables  5.1  and  5.2  to  eliminate  some  choices  of  CCM  algo¬ 
rithms  or,  even  better,  to  use  the  table  to  choose  some  algorithm  that  stands 
over  and  above  all  other  algorithms.  It  is  possible  that  some  factors  will  lead  to 
a  choice  of  algorithm  that  is  undesirable  because  of  other  factors  that  eJso 
apply.  An  example  of  this  situation  and  a  discussion  of  how  to  deal  with  it  will 
be  presented  later  in  this  chapter. 

The  correct  method  for  interpreting  the  entries  of  Tables  5.1  and  5.2  is  the 
following.  Compare  the  table  entries  in  each  row  for  a  characteristic.  If,  by 
changing  rows  within  a  particular  characteristic,  the  value  for  a  particulr  algo¬ 
rithm  goes  up,  then  the  algorithm  looks  better  when  the  second  characteristic 
is  present.  TTie  entries  in  the  tables  are  somewhat  subjective  and  the  basis  for 
choosing  each  entry  is  discussed  below.  The  simulation  study  discussed  later  in 
this  -chapter  was  also  used  to  verify  various  table  entries. 

5.1.  S3rstem  Characterization 

Distributed  Database  Management  Systems  can  be  characterized  according 
to  many  different  criteria.  In  this  section,  several  system  characteristics  that 
may  make  the  choice  of  CCM  easier  will  be  identified  and  described.  In  section 

5.2,  those  CCM-determining  factors  that  have  to  do  with  characterizing  transac¬ 
tions  will  be  discussed. 

5.1.1.  Nuinber  of  Sles 

In  a  DDBMS  with  a  large  number  of  sites,  the  performance  of  certain  con¬ 
currency  control  algorithms  will  be  affected.  For  example,  the  Majority  Con¬ 
sensus  algorithm  may  not  be  impacted  as  much  as  the  other  algorithms  since 
it  only  needs  a  majority  of  sites  to  accept  an  update  request.  Centralized  2PL  is 
much  more  likely  to  behave  badly  when  the  number  of  sites  is  large  because  all 
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System  and  workload  characteristics  that  affect  CCM  performance. 
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requests  must  pass  through  the  central  site.  Unless  the  central  site  is  much 
more  powerful  than  local  processing  requires,  Centralized  2PL  may  not  be  suit¬ 
able  for  networks  with  a  large  number  of  sites. 

Tickets  will  also  be  degraded  when  there  are  a  large  number  of  sites  since 
the  token  will  require  a  long  time  to  traverse  the  ring  of  sites  while  dispensing 
tickets.  It  will  also  be  difficult  to  estimate  how  many  tickets  to  leave  at  each 
site,  since  the  time  until  the  next  arrival  of  the  token  will  be  long  or  may  have 
a  high  variance. 

Conservative  T/0  needs  to  have  a  request  pending  from  every  site  before  it 
can  proceed  with  the  request  with  the  earliest  timestamp.  Having  a  large 
number  of  sites  could  make  this  wait  unnecessarily  long. 

If  the  network  is  made  up  of  only  a  few  sites,  then  Basic  2PL  with  Primary 
Copy  is  affected  slightly.  Because  it  is  necessary  to  pre-commit  the  primary 
copy  of  the  data  item  before  the  data  item  at  the  other  sites,  there  is  a  slight 
amount  of  parallelism  lost  that  can  be  implemented  in  other  algorithms.  When 
the  number  of  sites  in  the  network  is  large,  the  extra  delay  may  not  be  as 
significant.  No  other  algorithms  exhibit  any  serious  flaws  with  only  a  small 
number  of  sites;  however  none  of  them  can  take  advantage  of  it  either  and 
other  factors  should  be  considered. 

5.1.2.  Data  Replication 

If  the  DDBMS  is  partitioned  or  pau-tially  redundant,  meaning  that  each  site 
does  not  maintain  a  copy  of  every  data  item,  then  some  algorithms  can  be  elim¬ 
inated  immediately  since  they  have  only  been  considered  when  the  DDBMS  is 
fully  redundant.  These  are  the  Aggressive  T/0  and  Majority  Consensus  algo¬ 
rithms.  Since  the  central  site  keeps  a  copy  of  each  data  item.  Centralized  2PL 
has  the  advantage  that  each  site  does  not  have  to  keep  track  of  which  site 
stores  each  data  item.  Each  site  only  has  to  remember  which  site  is  the  cen¬ 
tral  site. 

If  the  DDBMS  is  fully  redundant,  then  no  clear  distinction  is  possible,  not  at 
least,  solely  on  the  basis  of  the  full  redundancy.  Other  factors  must  be  con¬ 
sidered. 

5.1.3.  Communication  Capacity 

If  the  network  is  heavily  utilized  and  nearly  saturated,  then  it  is  desirable 
for  the  CCM  to  send  as  few  messages  as  possible.  Basic  T/0  with  the  Thomas 
Write  Rule,  Multi-Version  T/0,  Wound-Wait,  and  Wait-Die  all  require  extra  mes¬ 
sages  when  a  request  is  rejected.  Basic  2PL  with  Primary  Copy  only  requires  an 
extra  message  when  a  request  is  rejected  because  of  deadlock,  and  is  therefore 
not  affected  as  strongly;  however,  the  waits-for  information  must  be  transmit¬ 
ted  between  sites.  In  Aggressive  T/0,  rejections  do  not  require  notification,  as 
they  are  sorted  out  at  each  site  based  on  the  timestamp.  If  the  network  is 
lightly  utilized,  the  only  algorithm  that  is  affected  is  Conservative  T/0.  It  can 
take  advantage  of  the  lightly  loaded  network  by  sending  out  lots  of  synchroniza¬ 
tion  information.  The  synchronization  information  can  help  to  make  Conserva¬ 
tive  T/0  operate  more  efficiently. 

5.1.4.  Network  Topology 

Most  of  the  algorithms  considered  are  reasonably  specific  about  the  logical 
network  topology  where  they  will  work  well.  For  example,  the  Tickets  algorithm 
will  only  work  with  a  logical  ring  topology.  All  the  other  algorithms  are 
designed  to  work  in  a  broadcast  network. 
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In  a  point-to-poinL  network,  the  only  algorithm  that  will  do  well  is  Majority 
Consensus,  since  it  may  only  have  to  communicate  with  a  majority  of  the  sites. 
Its  advantage  will  increase  with  the  length  of  the  transmission  delays  in  the 
system.  A  store  and  forward  network  can  be  considered  an  example  of  a  point- 
to-point  network. 

Majority  Consensus  will  not  operate  as  well  as  Tickets  in  most  ring  environ¬ 
ments,  especiadly  if  the  transmission  delays  are  long;  however,  it  may  not  do  as 
badly  as  the  other  algorithms.  A  ring  that  is  local  and  has  short  transmission 
delays  will  operate  like  a  broadcast  network  because  the  time  until  the  token  or 
message  can  traverse  the  ring  will  be  small. 

5.1.5.  Communication  Delays 

The  characteristics  of  the  communication  delay  in  the  distributed  network 
can  have  a  dramatic  effect  on  the  choice  of  CCM.  For  example,  consider  the 
case  where  the  sites  in  the  network  are  geographically  remote  or  the  intercon¬ 
nections  between  them  are  slow,  producing  high  transmission  times  for  mes¬ 
sages  between  sites. 

Centralized  2PL  (when  there  are  few  read  requests).  Conservative  T/0,  and 
the  Majority  Consensus  algorithms  (with  a  broadcast  network)  are  impacted 
less  by  higher  transmission  times  than  the  rest  of  the  algorithms  under  con¬ 
sideration.  This  is  not  to  say  that  the  algorithms  do  better  when  the  transmis¬ 
sion  times  are  high,  only  that  the  detrimental  Impact  is  not  as  great  as  for 
some  of  the  other  algorithms.  Centralized  2PL  is  reasonable  because  the 
waits-for  information  that  needs  to  be  transmitted  between  sites  in  other  2PL 
methods*  is  not  necessary  in  Centralized  2PL.  Centralized  2PL  would  become 
impractical  if  there  were  many  read  requests,  as  these  incur  the  added  expense 
of  needing  to  be  treinsmitted  to  the  central  site.  Conservative  T/0  may  be  good 
because  there  are  no  rejections;  hence  the  network  is  not  as  cluttered  with  the 
extra  messages  necessary  to  take  care  of  rejections  and  restarts.  One  negative 
aspect  for  Conservative  T/0  is  that  the  synchronization  messages,  if  they  are 
necessary,  will  take  a  long  time  to  arrive. 

Several  algorithms  will  perform  badly  when  the  transmission  times  are 
known  to  be  high.  Basic  T/0  with  the  Thomas  Write  Rule,  Multi-Version  T/0, 
Wound-Wait,  and  Wait-Die  have  the  problem  that  rejections  require  a  long  time 
to  be  sorted  out.  Tickets  is  heavily  impacted  because  the  token  will  circulate 
less  quickly,  necessitating  a  better  estimate  of  the  number  of  tickets  to  leave 
at  each  site  on  its  traversal.  This  estimate  of  the  number  of  tickets  to  leave  at 
each  site  is  crucial  to  the  operation  of  the  Tickets  algorithm.  The  biigh 
transmission  delays  will  cause  memy  rejections  in  Aggressive  T/0,  thereby  caus¬ 
ing  each  site  to  undo  work  that  was  already  completed  and  causing  extra  work 
at  the  sites. 

If  the  transmission  times  are  insignificant,  then  Centralized  2PL,  Tickets 
and  the  Majority  Consensus  algorithm  with  a  daisy  chained  communication  net¬ 
work  could  have  better  performance.  Centralized  2PL  will  improve  since  one  of 
its  drawbacks  is  that  reads  have  to  be  sent  to  the  central  site.  Since  the  cost  of 
transmitting  these  reads  would  be  low  in  this  case,  the  algorithm  should  per¬ 
form  well.  The  Tickets  algorithm  will  perform  better  because  the  token  will  cir¬ 
culate  more  quickly  and  the  task  of  estimating  the  number  of  tickets  to  leave 
at  each  site  is  easier.  The  other  algorithm  that  could  improve  is  the  Majority 
Consensus  algorithm  when  used  in  a  daisy  chained  (point-to-point) 


♦Wound-Wait  and  Wait-Die  do  not  require  transmission  of  waits-for  information;  however, 
they  will  be  seen  to  have  problems  because  rejections  will  take  a  long  time  to  sort  out. 
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communication  environment.  A  daisy  chained  communication  protocol  passes 
a  message  from  site  to  site  in  a  network  in  contrast  to  a  braodcast  protocol  that 
meikes  the  message  available  to  all  sites  at  nearly  the  same  time.  Since  only  a 
majority  of  sites  are  needed  to  accept  an  update,  if  the  request  is  passed  from 
one  site  to  another  and  the  cost  of  this  transmission  is  insignificant,  then  the 
Edgorithm  will  place  only  a  small  load  on  the  system  and  updates  will  be 
accepted  quickly  mth  little  overhead  placed  on  the  sites  compared  to  other 
CCMs. 

It  may  also  be  the  case  that  the  transmission  times  in  the  network  have  a 
high  variance,  so  that  they  do  not  fall  under  either  of  the  above  categories. 
Tnis  turns  out  to  be  detrimental  to  most  of  the  CCMs.  The  chance  of  rejection 
and  restart  is  increased  in  Basic  T/0  with  Thomas  Write  Rule,  Multi-Version 
T/O,  Aggressive  T/0,  Majority  Consensus,  Wound-Wait,  and  Wait-Die.  In  Tickets, 
the  task  of  estimating  the  number  of  tickets  to  leave  at  each  site  becomes 
difficult  if  the  time  until  the  token  makes  a  complete  revolution  around  the 
ring  is  unknown.  Conservative  T/0  will  perform  badly  as  well  since  the  transac¬ 
tions  get  processed  in  sequential  order  and  if  any  operation  requests  or  replies 
are  required,  the  processing  will  progress  roughly  at  the  speed  of  the  slowest 
communicating  site. 

6.1.6.  Processing  Capacity  of  Sites 

Distributed  database  management  systems  Eire  created  for  a  number  of 
reasons.  For  example,  two  companies  that  each  operate  their  ovm  computer 
system  may  merge  to  form  a  new  company.  The  newly  formed  company  may 
decide  that  it  would  like  to  combine  the  two  systems  into  a  distributed  network, 
and  the  processing  power  of  the  two  computer  systems  may  be  different.  In  this 
section,  the  implications  of  having  all  processors  equal  or  having  one  powerful 
processor  and  the  rest  not  as  powerful  will  be  considered  with  respect  to  the 
choice  of  CCM. 

If  all  the  processors  have  equal  processing  capability  and  there  is  a  heavy 
load  on  the  system,  then  Centralized  2PL  may  not  be  able  to  handle  the  addi¬ 
tional  load  placed  on  the  central  site.  There  is  an  additional  load  at  the  central 
site  because  all  requests  must  obtain  locks  there  before  they  proceed. 

If,  on  the  other  hand,  one  processor  is  much  more  powerful  than  the  rest, 
then  Centralized  2PL  may  be  the  best  choice.  This  sort  of  system  could  arise  in 
the  example  above  when  a  company  that  employs  a  centralized  system  merges 
with  a  company  that  uses  a  distributed  network  made  up  of  small  machines. 
Basic  2PL  with  Primary  Copy  would  also  perform  well  in  this  environment  if  all 
the  primary  copies  were  at  the  powerful  machine.  Whether  it  would  outperform 
Centralized  2PL  would  depend  on  the  number  of  read  requests.  If  there  were  a 
lEU-ge  number  of  read  requests,  this  would  be  bad  for  Centralized  2PL  since  the 
read  requests  require  transmission  to  the  central  site,  whereas  in  Basic  2PL 
with  Primary  Copy,  the  read  requests  can  be  processed  locally.  The  tradeoff  is 
that  the  deadlock  detector  in  Centralized  2PL  would  be  centralized  and  there¬ 
fore  much  simpler.  In  addition,  the  deadlock  detector  would  be  executed  on  a 
more  powerful  machine.  The  distributed  deadlock  detector  required  in  Basic 
2PL  with  Primary  Copy  also  requires  that  extra  messages  be  sent  to  pass  on 
information  to  the  deadlock  detectors  at  each  site. 
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5. 1 .7.  Storage  Capacity  of  ^tcs 

If  each  site  in  the  DDBMS  has  plenty  of  mass  storage,  then  Multi-Version 
T/0  can  be  used  effectively,  as  the  need  for  a  large  amount  of  storage  is  a  major 
drawback  for  its  use.  However,  if  all  the  sites  are  short  of  storage,  then  Multi- 
Version  T/0  will  be  impractical.  Centralized  2PL  will  probably  not  be  viable  in 
these  circumstances  since  at  least  one  site  (the  central  one)  needs  to  be  able 
to  store  all  the  data  items  or  at  least  locking  information  about  each  of  them. 
The  amount  of  information  stored  could  depend  on  the  number  of  active  tran¬ 
sactions,  how  the  locking  information  is  stored,  and  the  granularity  employed 
in  the  database. 

If  one  site  has  plenty  of  storage  and  the  rest  are  tight,  then  Centralized 
2PL  may  be  usable;  however  Multi -Version  T/0  will  still  not  be  feasible. 

5.2.  Transaction  Characterization 

In  this  section  the  different  distinguishing  features  that  transactions  may 
have  will  be  considered.  The  characteristics  of  the  workload  can  have  a  great 
effect  on  the  choice  of  CCM. 

5.2.1.  Workload  Distribution 

The  workload  may  be  distributed  among  the  sites  in  the  network  so  that 
updates  are  submitted  at  all  sites.  This  would  most  likely  be  the  normal  case, 
amd  none  of  the  algorithms  can  take  advemtage  of  this  facL  If,  however,  most  of 
the  updates  are  submitted  at  one  site,  then  certain  algorithms  have  an  advan¬ 
tage.  This  may  happen  in  a  payroll  database  where  paychecks  ee^e  printed  at 
the  sites  where  they  are  to  be  issued  if  there  is  a  convention  or  constraint  that 
salaries  can  only  be  changed  or  authorized  at  the  head  office  site. 

Basic  2PL  with  Primary  Copy  would  be  good  if  all  the  primary  copies  are  at 
the  site  where  the  updates  are  submitted:  however,  a  deadlock  mechanism 
would  still  be  needed.  Centralized  2PL  could  also  do  well,  except  that  all  reads 
must  go  through  the  central  site.  The  deadlock  mechanism  is  centralized,  so 
the  choice  between  Basic  2PL  with  Primary  Copy  and  Centralized  2PL  would 
depend  on  the  rate  of  read  requests. 

Other  algorithms  that  would  benefit  from  having  updates  submitted  from 
only  one  site  are  Basic  T/0  with  the  Thomas  Write  Rule,  Multi-Version  T/0,  and 
Aggressive  T/0,  since  they  do  well  when  the  requests  are  received  in  the  order 
sent. 

5.2.2.  Read/Write  Ratio 

There  are  many  ways  that  the  transaction  processing  load  can  be  charac¬ 
terized.  An  important  distinction  for  the  purpose  of  choosing  a  CCM  is  by  the 
relative  number  of  read  and  update  requests  submitted  by  transactions.  For 
example,  if  it  is  known  that  there  will  be  far  more  updates  than  read  requests, 
then  the  relative  merits  of  Centralized  2PL  and  Multi-Version  T/0  are 
enhanced. ♦  In  Centralized  2PL,  one  disadvantage  is  that  read  requests  must  be 
transmitted  to  the  central  site.  If  it  is  known  that  there  are  relatively  few  read 
requests,  then  this  disadvantage  is  diminished  accordingly. 


•If  the  system  designer  has  the  ability  to  choose,  it  may  be  advisable  in  this  case  not  to  re¬ 
plicate  the  DDBMS  at  all  and  to  have  a  segmented  or  partitioned  database.  In  a  partitioned 
DDBMS,  the  CCM  can  be  quite  simple.  It  will  be  assumed  that  the  choice  of  data  placement 
has  already  been  made. 
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Multi-Version  T/0  is  improved  for  a  different  reason.  The  only  operations 
that  can  get  rejected  in  Multi-Version  T/0  are  pre-commits;  however,  it  is  read 
requests  that  cause  their  rejection.  Since  there  are  fewer  reads,  there  may  be 
fewer  rejections. 

Another  interesting  feature  that  may  enable  an  easier  choice  of  COM  is  if 
the  system  under  investigation  has  a  large  number  of  reads  and  few  updates. 
The  implications  of  this  are  bad  for  Centralized  2PL  since  it  requires  all  read 
requests  to  be  sent  to  the  central  site,  thus  necessitating  extra  communica¬ 
tions,  Since  reads  are  never  rejected  in  Multi-Version  T/0,  this  algorithm 
should  perform  better  for  read  requests.  The  updates  may  have  trouble  com¬ 
pleting  however,  since  if  there  are  many  reads,  the  update  may  become 
obsolete  before  it  can  complete.  Another  benefit  for  Multi-Version  T/0  of  hav¬ 
ing  few  updates  is  that  not  many  versions  will  be  kept  around,  and  the  storage 
requirement  will  be  lessened. 

In  the  Majority  Consensus  algorithm,  it  is  necessary  to  treat  queries  as 
updates  with  a  null  write  in  order  to  preserve  consistency.  Eager  shows  that 
this  is  equivalent  to  requiring  that  implicit  read  locks  be  set  at  a  majority  of 
sites  [EagBl].  This  is  bad  if  there  are  known  to  be  many  read  requests. 

5.2.3.  Workload  InteracUon 

There  is  a  certain  amount  of  interaction  among  the  transactions  that  run 
in  a  DDBMS.  It  may  be  possible  to  classify  a  system  as  having  a  great  deal  of 
interaction.  This  would  mean  that  there  is  a  heavy  load  and  there  are  many 
potential  conflicts  between  the  requests  of  the  different  transactions.  This  may 
occur  because  the  transactions  are  long  and  require  a  lot  of  processing,  so  that 
by  the  time  the  transaction  finishes  processing,  the  variables  it  read  are  no 
longer  current.  Some  algorithms  do  not  allow  this  to  happen.  For  example,  in 
the  2PL  algorithms,  the  read  variables  would  be  locked  until  the  transaction 
was  completed.  In  Basic  T/0  with  the  Thomas  Write  Rule,  however,  the  transac¬ 
tion  will  be  restarted. 

The  only  algorithms  that  can  really  take  relative  advantage  of  this  situa¬ 
tion  are  Tickets  and  Conservative  T/0.  These  algorithms  operate  on  the  prem¬ 
ise  that  there  will  be  many  conflicts.  They  do  all  operations  in  order  and  never 
restart  a  transaction. 

If  there  is  only  a  small  amount  of  interaction  among  the  transactions  and  a 
light  load,  then  there  would  be  few  conflicts  and  Conservative  T/0  and  Tickets 
would  be  at  a  disadvantage  since  they  cannot  operate  any  faster  without 
conflicts  than  with  conflicts. 

The  rest  of  the  algorithms  perform  very  well  without  conflicts,  especially 
Aggressive  T/0,  which  was  designed  assuming  that  conflicts  would  be  rare  and 
capitalized  on  that  assumption. 

5.2.4.  Transaction  !^e 

When  some  transactions  request  a  large  number  of  granules,  certain  algo¬ 
rithms  perform  better  than  others.  The  timestamp-based  algorithms  other 
than  Conservative  T/0  behave  badly  because  by  the  time  the  long  transaction 
has  finished  its  processing,  the  variables  it  read  may  not  be  current  and  a  res¬ 
tart  of  the  transaction  may  be  necessary.  In  fact,  the  large  transactions  may 
be  continually  restarted  and  never  complete.  Centralized  2PL  can  take  relative 
advantage  of  the  large  transaction  because  all  the  locks  are  requested  locally 
at  the  central  site. 


-  73- 


If  all  the  transactions  are  small,  then  none  of  the  algorithms  are  affected 
adversely;  however,  none  of  the  CCMs  can  take  advantage  of  the  situation 
either. 

5.2.5.  Transaction  Length 

Timestaimp-based  algorithms  can  behave  badly  when  transactions  are  long 
and  require  a  lot  of  processing  time  for  the  same  reason  given  in  section  5.2.4. 
Short  transactions  are  usually  desirable  and  when  the  transactions  are  short, 
none  of  the  algorithms  are  adversely  affected. 

5.3.  Choosing  a  Concurrency  Control  Mechanism 

In  this  section,  a  number  of  example  systems  will  be  analyzed  to  determine 
if  particular  CCMs  will  be  more  applicable  than  others  because  of  some 
identifiable  distinguishing  features.  In  the  first  example,  the  system  under 
consideration  will  dictate  a  clear  choice  of  CCM.  In  some  of  the  other  examples, 
the  technique  will  be  more  a  matter  of  eliminating  CCMs  that  will  not  perform 
well.  The  result  will  not  always  be  a  single  choice  of  CCM. 

5.3. 1 .  Es:ani{>Ie  One  -  A  Clear  Cut  Choice 

The  first  step  in  the  investigation  is  to  identify  those  CCM-determining  fac¬ 
tors  that  apply  to  the  example  system.  Suppose  that  System  A  is  found  to  have 
the  following  characteristics;  (l)  System  A  exhibits  a  high  variance  in  treinsmis- 
sion  delays.  (2)  one  site  in  System  A  has  plenty  of  storage  while  the  others  do 
not,  and  (3)  the  transactions  that  are  typicedly  run  on  System  A  are  known  to 
have  mostly  writes  and  few  reads. 

With  this  information  and  Tables  5.1  and  5.2  it  is  possible  to  eliminate  all 
CCMs  except  Centralized  2PL  because  these  characteristics  are  detrimental  to 
all  the  other  algorithms.  Tables  5.1  and  5.2  show  that  Centralized  2PL  can  take 
advantage  of  the  fact  that  one  site  has  plenty  of  storage.  Therefore,  based  on 
this  information.  Centralized  2PL  would  probably  be  a  good  choice  of  CCM  for 
this  system. 

5.3.2.  Rxampie  Two  -  A  Choice,  But  Less  Clear  Cut 

System  B  is  a  partially  redundant  DDBMS  that  uses  a  broadcast  network 
which  has  insignificant  or  low  transmission  times.  Furthermore,  in  System  B 
all  the  sites  are  tight  on  storage.  In  fact,  none  of  the  sites  has  enough  storage 
capability  to  store  all  the  data  items.  This  is  part  of  the  reason  that  the  DDBMS 
is  only  partiedly  redundant.  Most  of  the  updates  in  System  B  originate  from  a 
single  site  located  in  the  head  office  of  the  company  that  owns  System  B.  Sys¬ 
tem  B  is  heavily  loaded  most  of  the  time. 

These  facts  concerning  System  B  enable  the  system  analyst  to  consult 
Tables  5.1  and  5.2  and  immediately  eliminate  two  algorithms  because  of  the 
partially  redundant  nature  of  System  B.  The  two  eliminated  are  Aggressive  T/0 
and  Majority  Consensus.  Further  inspection  dictates  the  elimination  of  Central¬ 
ized  2PL  and  Multi-Version  T/0  because  of  the  severe  storage  constraints  at 
each  of  the  sites.  Tickets  requires  a  ring  architecture  so  it  can  be  eliminated 
since  System  B  is  a  broadcast  network.* 


*A  physical  broadcast  network  can  support  a  logical  ring  network;  however ,  it  will  be  as¬ 
sumed  that  whenever  a  network  topology  is  mentioned  it  is  the  logical  and  not  physical 
structure  that  is  being  referred  to. 
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No  other  algorithms  can  be  eliminated  based  on  the  above  knowledge  of 
^stem  B;  however,  two  of  the  algorithms  remaining  are  able  to  take  relative 
advantage  of  the  ch£Lracteristics  of  System  B. 

Basic  2PL  with  Primary  Copy  and  Basic  T/0  with  the  'Fhomas  Write  Rule 
cam  take  advantage  of  the  fact  that  most  updates  originate  at  one  site.  The 
additional  deduction  necessary  to  make  the  choice  between  these  two  algo¬ 
rithms  is  that  Basic  2PL  with  Primary  Copy  requires  that  all  primary  copies  be 
at  one  site  to  take  advantage  of  the  fact  that  all  updates  originate  at  one  site. 
It  is  given  in  the  description  of  System  B  that  none  of  the  sites  have  enough 
storage  to  store  the  entire  database.  Therefore,  Basic  2PL  with  Primary  Copy 
can  be  eliminated  from  consideration  and  Basic  T/0  with  the  Thomas  Write  Rule 
remains  as  the  logical  choice  of  CCM  for  System  B.  It  will  be  seen  that  this 
choice  agrees  with  simulation  results  reported  in  section  5.9  where  Basic  2PL 
with  Primary  Copy  and  Basic  T/0  with  the  Thomas  Write  Rule  are  compared  with 
respect  to  the  throughput  of  jobs. 

5.3.3.  Example  Three  -  An  Unclear  Choice 

System  C  is  a  small,  fully  redundant  DDBMS  that  has  low  transmission 
delays  in  a  broadcast  network.  Most  of  the  requests  are  reads  with  few  updates. 
The  requests  are  submitted  from  all  the  sites,  each  of  which  has  an  identical 
processor.  One  site  has  more  storage  than  the  others.  This  site  is  not  utilizing 
that  storage  while  the  others  are  continually  out  of  storage. 

It  turns  out  that  System  C  has  characteristics  that  preclude  isolating  a  sin¬ 
gle  CCM  that  should  be  used.  The  best  that  can  be  done  is  to  eliminate  a  few 
CCMs  from  consideration.  By  considering  Tables  5.1  and  5.2  with  respect  to  the 
above  description  of  System  C,  Centralized  2PL,  Multi-Version  T/0,  Tickets,  and 
Majority  Consensus  can  be  eliminated  from  consideration.  Beyond  this,  no 
other  demerits  can  be  issued  based  on  the  system  description.  Furthermore,  if 
any  one  of  the  remaining  algorithms  is  chosen,  it  will  probably  not  perform  too 
badly  relative  to  the  true  best  choice.  Even  if  the  "best”  algorithm  is  not 
chosen,  the  one  that  is  chosen  will  probably  not  be  too  much  worse. 

5.3.4.  hnix'ovements  on  sin  Unclear  Choice 

After  Tables  5.1  and  5.2  have  been  used  to  eliminate  as  many  algorithms  as 
possible,  it  still  may  be  the  case  that  several  algorithms  remain  as  viable  alter¬ 
natives.  A  further  narrowing  of  the  choices  can  probably  be  done  if  a  clear 
bottleneck  exists.  The  bottleneck  can  be  identified  as  the  resource  that  is  most 
heavily  utilized.  Thus,  the  effort  should  concentrate  on  finding  the  algorithm 
that  most  eliminates  the  effect  of  the  bottleneck. 

For  example,  if  the  two  algorithms  that  are  left  after  the  first  elimination 
process  are  Basic  T/0  with  the  Thomas  Write  Rule  and  Multi-Version  T/0  and 
the  major  bottleneck  is  determined  to  be  that  the  communication  network  is 
saturated,  then  a  choice  is  possible.  Multi-Version  T/0  would  be  superior  here 
because  it  causes  a  smaller  number  (or  equal  number)  of  rejections  and  res¬ 
tarts  than  Basic  T/0  with  the  Thomas  Write  Rule.  The  proof  of  this  is  given  in 
^pendix  C.  Fewer  rejections  would  be  desirable  since  this  would  reduce  the 
load  on  the  overloaded  communication  network. 

5.3.5.  Results  PVom  the  Tables 

Tables  5.3,  5.4,  eind  5.5  show  the  results  of  an  experiment  to  investigate  the 
usefulness  of  Tables  5.1  and  5.2.  The  experiment  is  also  designed  to  gain  addi¬ 
tional  insight  into  the  relative  performance  of  the  different  CCMs.  The  experi¬ 
ment  consists  of  two  main  sections;  the  first  assumes  that  the  system  designer 
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is  required  to  specify  an  entry  for  each  category  in  Table  5.1  and  Table  5.2.  e.g., 
he  must  specify  the  communication  delay  as  being  high,  low,  or  as  highly  vari¬ 
ant,  and  the  network  topology.  The  second  part  of  the  experiment  makes  the 
assumption  that  only  a  subset  of  the  categories  need  to  be  specified.  A  com¬ 
puter  program  enumerates  all  possible  combinations  of  category  entries. 
There  are  13,824  cases  tested  for  the  first  part  of  the  experiment  and  1,259,712 
for  the  second.  Correlations  that  exist  are  not  taken  into  account,  nor  are  even 
the  footnotes  in  the  table.  The  values  in  the  table  are  used  exactly  as  they 
appear  to  calculate  the  results. 

Table  5,3  shows  for  each  algorithm  the  percentage  of  cases  that  the  algo¬ 
rithm  survives.  The  survival  of  an  algorithm  is  based  on  considering  the 
minimum  table  entry  value  for  all  specified  categories  (or  rows)  being  con¬ 
sidered  for  the  case.  If  the  minimum  table  entry  is  greater  than  or  equal  to  a  3, 
then  the  algorithm  is  marked  as  a  survivor.  More  than  one  algorithm  may  sur¬ 
vive  for  a  particular  combination  of  category  attributes  and  Table  5.4  shows  the 

percentage  of  time  that  0.  1.  2  .  9  algorithms  survived.  A  high  percentage  of 

these  system  eliminated  all  the  algorithms  due  to  some  workload  or  system 
characteristic.  For  these  cases,  it  may  be  necessary  to  investigate  the  degree 
to  which  each  algorithm  was  affected  by  each  criterion.  Some  algorithms  may 
not  have  survived  because  they  had  a  2  for  a  particular  characteristic  while 
others  had  a  1.  Though  neither  of  the  algorithms  would  have  survived  in  the 
experiment,  the  first  algorithm  (with  the  2)  would  obviously  be  the  more 
desired  choice  of  CCM. 

When  the  choice  of  algorithm  was  narrowed  down  to  three  or  fewer  algo¬ 
rithms,  the  algorithms  listed  below  survived.  They  are  listed  in  order  of 
decreasing  frequency  of  survival.  The  order  was  the  same  both  when  it  was 
mandatory  for  the  system  designer  to  specify  a  value  for  each  category,  and 
when  it  was  only  necessary  to  specify  a  subset  of  the  category  entries.  The 
order  of  algorithms  is  Conservative  T/0,  Basic  2PL,  Majority  Consensus,  Tickets, 
Centralized  2PL.  Aggressive  T/O,  Wait-Die,  Wound-Wait,  Basic  T/0,  and  Multi- 
Version  T/0. 

Another  part  of  the  experiment  consists  of  counting  how  often  each  algo¬ 
rithm  comes  out  with  the  first,  second,  or  third  highest  score  when  it  survived. 
The  method  for  doing  this  was  to  sum  all  table  entries  and  then  rank  the  three 
highest  remaining  surviving  algorithms.  Table  5.5  shows  the  results  of  this  part 
of  the  experiment. 

5.4.  Other  Factors  Affecting  the  Choice  of  Algorithm 

The  factors  considered  in  sections  5.1,  5.2  and  5.3  were  all  concerned  with 
the  description  of  the  system  and  workload  in  terms  of  physical  characteristics 
and  attributes.  There  are  other  considerations  that  may  affect  the  choice  of 
CCM  for  a  particular  operating  environment.  For  example,  it  may  be  the  case 
that  reliability  is  an  important  criteria  for  the  selection  of  a  CCM.  Another  fac¬ 
tor  may  be  the  ease  of  implementation  for  each  algorithm. 

In  this  section  these  two  additional  criteria  will  be  evaluated.  First,  relia¬ 
bility  will  be  discussed  for  each  algorithm.  The  results  concerning  reliability 
are  based  on  work  in  [Eag6l].  Eager  found  that  all  the  algorithms  he  con¬ 
sidered  could  be  made  robust  under  certain  common  t>pes  of  failures  without 
much  additional  effort,  except  the  Majority  Consensus  algorithm.  Majority  Con¬ 
sensus  required  more  work  to  make  it  robust  It  was  pointed  out  that  most 
DDBMSs  operate  in  a  low  failure  environment,  and  in  that  case,  performance- 
decisions  concerning  CCMs  should  be  made  with  the  assumption  that  failures 
are  infrequent.  This  was  based  on  the  fact  that  the  robust  versions  of  the 
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algorithms  did  not  impose  much  overhead  compared  with  the  non-robust  ver¬ 
sions  when  there  are  no  failures. 

If  the  system  under  consideration  were  known  to  have  a  high  faalure  rate, 
then  it  might  be  necessary  to  compare  the  robust  versions  of  the  algorithms.  It 
will  be  assumed  that  if  the  failure  rate  is  high,  then  it  would  be  more  produc¬ 
tive  for  the  system  analyst  to  spend  more  time  trying  to  make  the  system  more 
reliable  rather  than  trying  to  determine  the  best  choice  of  CCM.  Therefore,  the 
assumption  that  the  system  under  consideration  is  reliable  will  be  retained. 

The  other  criterion  is  ease  of  implementation.  For  this,  the  only  con¬ 
sideration  will  be  in  terms  of  programming  eind  maintenance,  and  it  will  be 
assumed  that  no  hardware,  communication  protocol  or  major  operating  system 
changes  are  needed.  In  other  words,  if  these  changes  are  needed,  then  the 
implementation  effort  may  be  considered  massive.  Otherwise,  the  difficulty  in 
implementation  is  in  approximately  the  order  shown  in  Figure  5.2  with  Central¬ 
ized  2PL  as  the  easiest  to  implement  and  Basic  2PL  with  Primary  Copy  as  the 
most  difficult. 

The  approximate  order  is  arrived  at  by  considering  the  special  functions 
that  the  different  algorithms  require.  The  programs  necessary  to  provide  these 
functions  are  then  rated  with  respect  to  their  difficulty  and  a  relative  ranking 
is  achieved.  For  example,  the  problem  of  distributed  deadlock  detection  is  con¬ 
sidered  to  be  quite  difficult  to  program  while  the  centralized  deadlock  detection 
program  should  be  straightforward.  The  problem  of  the  choosing  a  site  to 
create  a  new  token  if  the  token  disappears  in  the  Tickets  algorithm  is  con¬ 
sidered  difficult  to  handle  and  rejections  are  difficult  in  Aggressive  T/0  since  it 
is  possible  that  a  completed  transaction  may  need  to  be  rolled  back.  The  rank¬ 
ing  of  algorithms  in  Figure  5.2  is  rather  subjective  but  it  is  believed  to  be  a 
indicative  of  the  true  situation. 

Another  consideration  in  choosing  a  CCM  may  arise  from  the  fact  that  the 
system  may  be  undergoing  changes,  so  that,  for  example,  it  is  important  for  the 
CCM  to  operate  well  when  the  storage  at  each  site  is  plentiful  as  well  as  when 
one  site  has  plenty  and  the  rest  are  strapped  for  storage.  This  could  occur  if  a 
company  has  plans  for  adding  memory  to  a  number  of  its  sites  in  the  near 
future.  It  would  be  unfortunate  if  the  CCM  had  been  chosen  to  optimize  the  sys¬ 
tem  under  the  initial  characteristics  when  Tables  5.1  and  5.2  could  have  been 
used  to  choose  a  CCM  that  would  be  satisfactory  in  both  environments. 

5.5.  Emulation  as  an  Alternative 

It  may  be  impossible  to  narrow  the  choice  of  algorithm  down  to  one  CCM 
even  after  specifying  a  number  of  transaction  and  system  characteristics.  If 
this  is  the  case,  it  may  be  necessary  to  model  the  system  to  compare  the 
remaining  techniques.  It  is  conjectured  that  it  may  be  difficult  to  capture  the 
necessary  detail  needed  to  compare  the  different  algorithms  using  an  analytic 
technique  and  a  simulation  study  may  be  necessary. 

Simulations  have  been  used  in  the  past  to  investigate  some  distributed 
CCMs  [Rie79,  Lin82].  Ries  used  a  different  transaction  processing  model  and 
different  assumptions  to  model  Centralized  2PL  and  Basic  2PL  with  Primary 
Copy.  He  assumed,  for  instance,  that  all  locks  were  predeclared  and  that  the 
data  base  was  partitioned. 

Lin  and  Nolte  use  approximately  the  same  transaction  processing  model  as 
that  used  in  this  thesis;  however,  they  do  not  consider  the  full  impact  of  two- 
phase  commit  in  their  algorithms.  They  also  do  not  take  into  account  the  fact 
that  there  are  a  number  of  sites  in  the  network.  They  model  Basic  2PL  (without 
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I^rimary  Copies),  Basic  T/0  with  the  Thomas  Write  Hule,  and  Multi-Version  T/0. 
They  conclude  that  the  timestamp  ordered  methods  are  never  feasible  to  use 
except  when  the  transactions  are  short. 

A  simulation  study  will  be  presented  here  that  evaluates  the  performance 
of  Basic  2PL  with  Primary  Copy  and  Basic  T/0  with  the  Thomas  Write  Rule.  This 
study  was  carried  out  so  that  some  of  the  entries  in  Tables  5.1  and  5.2  could  be 
checked  and  also  to  measure  the  performance  of  the  two  edgorithms  quantita¬ 
tively.  In  many  ways  it  encompasses  more  detail  than  the  model  used  in 
[Lin82].  This  section  describes  the  implementation,  results,  and  conclusions 
that  can  be  drawn  from  the  simulation  study. 

5.5.1.  The  S&mulation  Model 

The  assumptions  that  eire  incorporated  in  the  two  models  will  be  explained 
in  this  section.  The  assumptions  common  to  both  models  will  be  motivated  first 
and  then  each  model  will  be  analyzed  in  more  detail  separately. 

There  are  a  large  number  of  parameters  that  could  be  included  in  a  model 
of  a  DDBMS  CCM;  however,  to  make  the  model  more  usable,  it  is  desirable  to 
consider  a  subset  of  these  parameters  so  that  the  number  of  combinations  of 
parameters  to  try  is  manageable.  One  way  to  do  this  is  by  making  assumptions 
about  the  values  of  the  parameters  being  ignored.  Of  course,  if  the  parameter 
space  is  cut  too  drastically,  then  the  model  may  have  little  relevance  to  the 
real  world  situation  that  is  being  modelled. 

In  this  simulation  study,  it  is  assumed  that  all  sites  are  identical  cind  that 
transactions  all  choose  the  same  number  of  granules.  The  service  times  for 
each  request  are  drawn  from  an  exponential  distribution.  It  is  further  assumed 
that  the  choice  of  any  particular  granule  is  equally  likely  emd  that  sequences  of 
granule  requests  are  not  correlated.  In  other  words,  the  granule  requests  are 
assumed  to  be  independent.  Also,  the  multiprogramming  level  at  each  site  is 
fixed  and  whenever  a  transaction  completes,  another  transaction  replaces  it 
after  some  amount  of  time,  called  the  think  time.  The  value  of  the  think  time 
may  be  near  zero  to  simulate  a  transaction  processing  system  or  much  greater 
than  zero  to  model  a  query  processing  system.  Since  each  site  is  assumed  to  be 
identical,  this  implies  that  the  DDBMS  is  fully  redundant.  The  situation  where 
the  DDBMS  is  still  fully  redundant  but  the  transaction  workload  offered  from 
each  site  is  different  will  also  be  considered. 

All  service  times  in  the  models,  with  the  exception  of  the  communication 
delay,  are  load-dependent  in  the  sense  that  they  depend  on  the  number  of  tran¬ 
sactions  currently  obtaining  service  at  the  site.  The  communication  delays  are 
assumed  to  be  exponentially  distributed  with  a  constant  mean.  The  individual 
models  for  each  CCM  will  be  discussed  next  with  each  of  their  additional 
assumptions.  The  simulation  program  listings  and  sample  output  are  included 
as  part  of  Appendix  G  of  this  thesis. 

5.5.2.  Emulation  of  Basic  2PL  with  Primaiy  Copy 

Two  further  assumptions  are  made  for  Basic  2PL  with  Primary  Copy.  The 
first  is  that  the  primary  copies  are  assumed  to  be  evenly  distributed  among  all 
the  database  sites.  The  second  assumption  involves  the  mechanism  used  to 
detect  deadlock.  Instead  of  implementing  a  full-fledged  deadlock  detector  as 
part  of  the  simulation,  the  simulation  program  uses  a  timeout  mechanism  to 
detect  deadlock.  In  other  words,  if  a  transaction  takes  longer  than  some 
specified  time  it  is  assumed  that  the  transaction  is  involved  in  a  deadlock  and 
the  transaction  is  restarted.  Other  than  the  above  assumptions,  the  algorithm 
is  implemented  as  described  in  Chapter  4  using  GPSS.  The  results  of  the  study 
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are  discussed  later  in  this  chapter. 

5.5.3.  Simulatioii  of  Basic  T/0  with  the  Thomas  Write  Rule 

No  further  assumptions  are  necessary  for  this  algorithm.  It  is  implemented 
using  GPSS  as  described  in  Chapter  4  and  the  results  of  the  study  follow  in  the 
next  section. 

5.6.  Results  of  the  Simulation 

Table  5.6  shows  the  parameters  that  are  used  in  both  models.  The  addi¬ 
tional  parameter  MULT  is  needed  for  Basic  2PL  with  Primary  Copy  and  is  the 
amount  of  time  allowed  before  it  is  assumed  that  a  transaction  has  deadlocked. 
This  time  is  measured  from  when  the  transaction  first  enters  the  queue  for  the 
granule.  It  is  specified  as  a  multiple  of  the  service  time  that  could  be  expected 
in  the  absence  of  queueing  for  granules.  A  multiple  of  four  seems  to  work  quite 
well  and  is  used  in  most  runs. 

NOS  is  the  number  of  sites  in  the  network  and  ranged  from  2  to  6  in  the 
study.  NOGRA  represents  the  number  of  granules  at  each  site  and  ranged  from  4 
to  40.  MPL  is  the  multiprogramming  level  at  each  site.  It  ranged  from  1  to  20. 
NOLR  is  the  number  of  read  requests,  NOPC  is  the  number  of  pre-commits,  and 
LRS,  PCS.  and  UPS  are  the  service  times  for  reads,  pre-commits,  and  update 
requests,  respectively.  NOLR  and  NOPC  were  one  for  most  runs.  The  means  of 
the  exponentially  distributed  (load-dependent)  service  times  are  computed  as 
follows: 


LRS  =  overhead  *  number  of  transactions  active  at  site  +  LRS 

PCS  =  overhead  ♦  number  of  transactions  active  at  site  +■  PCS 

UPS  =  overhead  ♦  number  of  transactions  active  at  site  -t-  UPS 

where  overhead  was  10  for  the  tests.  LRS  and  UPS  were  60  and  PCS  was  100  for 

most  of  the  tests. 

The  communication  delay  was  assumed  to  be  load-independent  and 
exponentially  distributed  and  was  varied  from  1  to  100  as  part  of  the  study  to 
determine  the  effect  of  higher  transmission  delays  on  the  two  CCMs.  The  think 
time,  THINK,  was  varied  from  1  to  100  to  simulate  heavy  and  light  load.  The 
length  of  the  simulation,  LENGTH,  was  varied  from  10,000  time  units  to  30,000 
time  units  depending  on  the  length  of  the  transactions.  Average  response 
times,  throughputs,  and  the  number  of  deadlocks  or  restarts  were  tabulated 
and  are  reported  as  performance  measures  of  the  system. 

The  particular  parameter  settings  used  in  the  runs  can  be  found  in  Appen¬ 
dix  E  where  the  data  and  results  from  the  study  are  presented  in  tabular  form. 
Tables  E,1  -  E.6  contain  results  from  the  first  part  of  the  study  and  Tables  E.7  - 
E.  12  represent  results  of  extensions  described  in  sections  5.8  and  5.9. 

5.7.  Ccxiclusions  of  the  Simulation  Stucfy 

The  conclusions  presented  here  are  based  on  data  from  over  900  runs  with 
various  parameter  settings.  Figure  5.3  shows  throughput  versus  NOS  for  three 
cases.  The  curve  marked  FOUR  (2PL)  resulted  from  models  where  the  deadlock 
timeout  multiple  (MULT)  is  four  while  the  models  used  to  plot  the  curve  marked 
TWO  (2PL)  used  a  multiple  of  two.  FOUR  (2PL)  and  TWO  (2Pli)  were  obtained  by 
simulating  the  Basic  2PL  with  Primary  Copy  algorithm.  The  remaining  curve, 
marked  T/O  in  the  figure,  indicates  the  throughput  that  can  be  expected  from 
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Figure  5  :l 


Graph  of  Throughput  versus  number  of  sites  for  Basic  T/0  and  Basic  'dPL  with 
Primary  Copy  with  two  different  deadlock  detection  timeout  multiples 
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Basic  T/0  with  the  Thomas  Write  Rule.  It  is  clear  from  Figure  5.3  that  the  per¬ 
formance  of  Basic  T/0  is  better.  Note  also  that  Basic  2PL  with  Primary  Copy 

saturates  much  more  quickly,  especially  if  the  deadlock  detector  is  not  invoked 
very  often. 

Figure  5.4  shows  throughput  versus  granularity.  As  NOGRA  increases,  the 
number  of  rejected  transactions  decreaises  and  a  smedl  improvement  in 
throughput  is  gained  as  expected.  This  is  partly  because  transaction  size  is 
held  constant  and  as  the  granularity  becomes  finer,  the  portion  of  the  database 
accessed  by  each  transaction  becomes  proportionately  smaller. 

Figure  5.5  shows  throughput  versus  MPL.  Basic  2PL  with  Primary  Copy 
saturates  and  the  throughput  decreases  when  the  MPL  at  each  site  is  about  four 
to  six.  The  throughput  of  Basic  T/0  does  not  decrease  even  when  the  MPL  is  ten 
at  each  site.  The  communication  delay  is  one  for  Figure  5.5. 

Figure  5.6  shows  that  as  the  communication  delays  are  increased,  Basic 
T/0  throughput  is  affected  more,  relatively  speaking,  than  Basic  2PL  with  Pri¬ 
mary  Copy;  however,  this  is  because  the  waiting  time  for  locks  is  the  dominant 
factor  for  Basic  2PL  with  Primary  Copy  and  not  the  communication  delay. 
Basic  2PL  with  Primary  Copy  experiences  a  very  slight  increase  in  the  number 
of  deadlocks  and  only  at  heavy  load.  Basic  T/0  has  more  rejections  as  the  com¬ 
munication  delay  increases,  especially  at  high  load. 

A  load-independent  case  was  also  tried  where  the  service  times  at  each  site 
for  reading,  pre-commiting,  and  updating  were  not  dependent  on  the  number  of 
transactions  active  at  the  site  but  were  simply  exponentially  distributed  with  a 
constant  mean.  Obviously,  the  throughput  increases  for  both  Basic  T/0  and 
Basic  2PL  with  Primary  Copy,  especially  under  heavy  load  for  Basic  T/0.  The 
impact  under  heavy  load  for  Basic  2PL  with  Primary  Copy  was  not  as  great  since 
the  transaction  spent  most  of  its  time  waiting  in  queue  at  the  granules  and  the 
slowdown  due  to  competing  for  compute  power  was  not  as  significant. 

Several  other  observations  can  be  made  from  the  results  of  this  study. 
Basic  2PL  with  Primary  Copy  has  lower  throughput  in  all  cases  tested.  This  is 
due  in  part  to  the  fact  that  it  takes  longer  to  detect  that  a  transaction  should 
be  restarted  them  in  Basic  T/0  in  general,  since  in  Basic  T/0  the  decision  about 
whether  a  transaction  request  should  be  restarted  is  made  almost  immediately 
on  arrival  to  the  site,  whereas  in  Basic  2PL  with  Primary  Copy  the  request  must 
be  found  to  be  in  a  deadlock  before  it  is  restarted.  Deadlocks  may  not  occur 
frequently  but  there  may  be  a  significant  time  lapse  before  one  is  detected. 

The  average  response  time  for  transactions  that  complete  does  not  change 
much  as  the  communication  delay  increases  This  is  because  only  the  transac¬ 
tions  that  complete  are  included  in  this  average  and  those  are  most  likely  those 
transactions  with  low  communication  delays. 

One  of  the  assumptions  used  in  this  model  was  that  all  transactions  are 
identical.  Under  this  assumption,  when  a  transaction  is  rejected  it  is  allowed  to 
restart  from  the  beginning,  ignoring  its  past  history,  its  past  service  times,  the 
particular  granules  it  chose,  etc.  This  procedure  of  memoryless  restart  would 
also  be  reasonable  if,  when  transactions  are  rejected,  they  are  simply  rejected 
and  not  restarted  by  the  system.  If  some  transactions  are  much  larger  than 
others  (either  longer  service  times  or  more  granule  requests),  then  it  is  con¬ 
jectured  that  the  impact  of  having  a  high  variance  in  transaction  size  will  be 
much  greater  on  Basic  T/0  than  on  Basic  2PL  with  Primary  Copy  because  there 
are  more  rejections  in  Basic  T/0  than  there  are  deadlocks  found  in  Basic  2PL 
with  Primary  Copy.  Lin  eind  Nolte  reported  that  Basic  T/0  was  sensitive  to 
transaction  size  variance;  however,  they  also  reported  that  they  found  that 
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Basic  2PL  with  Primary  Copy  was  generally  superior  to  r3asic  T/0  [Lin82].  They 
found  the  case  where  Basic  T/0  was  especially  bad  was  when  the  transaction 
size  was  large.  In  this  study  it  has  been  shown  that  if  the  sites  are  identical  cind 
the  transactions  are  small  and  all  nearly  the  same  size,  then  Basic  T/0  outper¬ 
forms  Basic  2PL  with  Primary  Copy  in  all  cases  tested. 

Another  reason  that  Basic  T/0  outperforms  Basic  2PL  with  Primeiry  Copy  is 
that  there  is  a  phase  during  processing  in  the  two-phase  locking  algorithm 
where  the  algorithm  is  not  able  to  incorporate  as  much  parallelism  as  the 
timestamping  algorithm.  This  occurs  when  Basic  2PL  with  Primary  Copy  first 
sends  a  pre-commit  to  the  primary  copy  site  and  then  the  primary  copy  site 
sends  all  the  rest  of  the  sites  a  pre-commit  on  behalf  of  the  origination  site.  In 
the  Basic  T/0  algorithm,  all  the  pre-commits  can  be  sent  out  in  parallel  from 
the  origination  site.  This  would  appear  to  be  one  of  the  major  reasons  that 
Basic  2PL  with  Primary  Copy  is  slower  than  Basic  T/0. 

Paredlelism  can  be  increased  in  Basic  2PL  with  Primary  Copy  by  eliminat¬ 
ing  the  primary  copy  feature;  however,  as  mentioned  before  in  the  description 
of  the  Basic  2F^L  with  Primary  Copy  algorithm  description  in  Chapter  4,  the  tra¬ 
deoff  is  that  more  deadlocks  are  possible.  An  experiment  was  carried  out  to 
determine  whether  having  primary  copies  is  desirable.  Basic  2PL  was  simulated 
without  the  primary  copies  feature.  This  algorithm  will  be  referred  to  as  Basic 
2PL  without  Primary  Copies.  The  results,  tabulated  in  Tables  E.7  and  E.8,  are 
inconclusive  as  to  when  primary  copies  are  useful.  Even  when  the  difference  is 
in  favor  of  Basic  2PL  without  Primary  Copies,  the  difference  is  still  not  enough 
to  come  anywhere  near  the  performance  that  can  be  expected  from  Basic  T/0 
for  the  same  system. 

One  other  important  observation  is  that  very  different  performance  is 
obtained  when  comparing  a  system  with  X  sites  and  P  granules  at  each  site  with 
a  system  with  P  sites  and  X  granules  at  each  site  even  though  there  are  the 
same  number  of  granules  in  each  system.  This  is  because  the  number  of  pre¬ 
commits  differs  greatly  in  the  two  systems.  In  the  first  system  each  transac¬ 
tion  must  do  X  pre-commits  while  in  the  second  system,  each  transaction  gen¬ 
erates  P  pre-commits.  Recent  distributed  database  models  by  some  authors 
have  only  used  a  count  of  the  total  number  of  granules  in  the  system  instead  of 
keeping  track  of  which  granules  belong  to  each  site.  This  is  a  simplification 
which  can  only  be  done  if  the  DDBMS  is  partitioned.  If  some  of  the  data  is  dupli¬ 
cated,  then  certain  operations  can  be  done  in  parallel  that  otherwise  could  not 
be  and  the  distinction  eimong  sites  is  important. 

5.8.  Relaxing  the  Assumption  that  Transactions  are  All  Identical 

In  the  previous  sections,  when  a  transaction  was  restarted,  it  chose  new 
granules  and  new  service  times  as  if  a  new  transaction  had  entered  the  system. 
This  was  justifiable  under  the  assumption  that  all  transactions  are  identical  or 
if  rejected  transactions  are  not  automatically  restarted  by  the  system.  If  there 
is  a  characterization  of  what  constitutes  a  transaction  that  will  be  rejected, 
then  this  type  of  transaction  was  obviously  being  selected  out  of  the  population 
of  jobs,  since  whenever  one  was  found  it  was  restarted  until  it  weis  no  longer  a 
rejectable  transaction. 

The  simulation  was  changed  to  handle  this  situation  by  remembering  ser¬ 
vice  times  and  granule  choices  and  only  generating  a  new  timestamp  for  a  res¬ 
tarted  transaction  The  service  time  calculations  are  slightly  different  than  in 
the  original  model.  They  are  computed  as; 


LRS  =  overhead  *  number  of  transactions  active  at  site  4-  LRS 


PCS  =  overhead  *  number  of  transactions  active  at  site  +  PCS 
UPS  =  overhead  *  number  of  transactions  active  at  site  +  UPS 


where  LRS,  PCS,  and  UPS  are  the  values  that  are  exponentially  distributed 
instead  of  LRS,  PCS,  and  UPS  as  in  the  original  model.  LRS,  PCS,  and  UPS  are 
still  load-dependent.  For  most  of  the  tests,  the  means  of  the  exponential  distri¬ 
bution  for  LRS  and  UPS  were  60  and  for  PCS  was  1 00, 

The  results  for  the  modified  simulation  are  qualitatively  the  same  as  for 
the  original  version.  The  results  are  tabulated  in  Tables  E.9  and  E.IO  in  Appen¬ 
dix  E.  Figure  5.7  shows  throughput  versus  granularity  for  a  network  with  four 
sites,  each  site  having  a  MPL  of  four  with  varying  granularities  at  each  site  from 
four  granules  to  twenty  granules.  Communication  service  delays  (COMS)  of  1 
amd  100  are  plotted  for  each  algorithm.  Basic  T/0  has  much  better  throughput 
everywhere. 

Figure  5.8  shows  the  performance  of  DDBMSs  where  there  are  two  sites  with 
thirty  gremules  each  and  the  MPL  is  varied  from  two  to  twenty.  'Fhroughput 
increases  raonotonically  over  this  range  for  Basic  T/0;  however  both  Basic  2PL 
algorithms  saturate  emd  when  the  load  gets  high  the  throughput  actually 
decreases  for  Basic  2PL  with  Primary  Copy.  Again,  networks  with  communica¬ 
tion  service  delays  of  both  1  and  100  are  shown  for  each  algorithm.  The  lines 
marked  MOD  2PL  are  for  the  Basic  2PL  without  Primary  Copies  algorithm. 

5.9.  Relaxing  the  Assumption  that  Sites  are  AH  Identical 

Table  E.ll  and  E.12  show  the  results  of  the  simulation  when  it  was  modified 
so  that  the  sites  were  not  identical.  The  sites  were  still  assumed  to  be  fully 
replicated:  however,  the  workload  offered  at  each  site  was  not  identical.  To 
model  this  workload  difference,  the  think  time  at  some  site  was  made  to  be  ten 
times  as  long  as  the  think  time  at  each  other  site.  From  Tables  E.ll  and  E.12,  it 
is  clear  that  Basic  2PL  with  Primary  Copy  does,  in  fact,  have  higher  throughput 
them  Basic  T/0  with  the  Thomas  Write  Rule  at  very  low  loads  when  there  are  few 
conflicts.  This  is  because  there  are  only  two  sites,  one  of  which  is  effectively 
not  offering  any  processing  load,  locally  or  non-locally  and  the  transactions  are 
not  found  to  wait  due  to  conflicts.  The  difference  in  throughput  is  very  small 
and  does  not  even  occur  until  the  granularity  is  sufficiently  fine  to  almost 
guarantee  no  conflicts. 

5.10.  Summary  of  Simulation  Results 

The  results  of  this  simulation  study  can  be  summarized  in  one  sentence, 
namely  that  Basic  T/0  outperforms  Basic  2PL  with  or  without  Primary  Copies  in 
almost  all  cases  tested.  The  above  statement,  though  true,  should  be  qualified 
by  reiterating  the  assumptions  that  were  made  during  the  tests.  For  example, 
it  was  assumed  (l)  that  each  granule  was  chosen  with  equal  probability,  (2)  that 
the  primary  copies  were  equally  distributed  among  the  sites,  (3)  that  the 
DDBMS  is  fully  redundant  and  the  sites  identical,  and  (4)  that  each  tremsaction 
is  identical  in  the  sense  that  each  job  chooses  the  same  number  of  granules 
amd  that  the  service  times  are  all  chosen  from  the  same  exponential  distribu¬ 
tion. 

If  (1)  does  not  hold  in  a  particular  system,  then  reasoning  analogous  to 
that  used  in  the  centralized  environment  can  be  used  to  show  that  nonunifor¬ 
mity  leads  to  strictly  worse  performance.  The  second  assumption  affects  only 
Basic  2PL  with  Primary  Copy.  The  performance  can  be  improved  if  the  location 
of  the  primary  copies  is  chosen  cleverly.  The  Basic  2PL  without  Primary  Copies 
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Primary  Copies. 
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simulation  shows  that  even  without  Primary  Copies  and  with  low  occurrence  of 
deadlock,  the  improvement  is  not  significant.  Thus,  assumption  (2)  seems  to 
have  little  impact  on  the  model  results. 

Assumption  (3)  does  have  ein  impact  and  other  environments  should  be 
considered.  A  slight  relaxation  of  (3)  is  investigated  in  section  5.9.  The  impli¬ 
cations  of  (4)  have  already  been  discussed  in  detail  in  section  5.8. 

Previous  results  reported  by  Lin  claimed  that  timestamping  methods  were 
strictly  worse  than  the  two  phase  locking  methods.  To  overcome  the  large 
number  of  parameters  needed  to  characterize  these  algorithms,  Lin  attempted 
to  determine  which  parameters  were  the  most  relevant  and  eliminated  the  rest 
from  consideration,  necessitating  fewer  runs  for  the  simulation  [Lin82],  In 
doing  this:  however,  three  important  problems  were  not  considered,  (l)  The 
service  times  were  assumed  load-independent,  i.e.,  the  service  times  did  not 
depend  on  the  load  at  each  site.  (2)  TTie  sites  were  not  distinguishable  In  his 
model.  (3)  Pre-commits  are  not  considered.  The  second  problem  is  most 
significant.  It  was  mentioned  above  and  is  clear  from  the  tabulated  results  in 
Appendix  E  that  two  systems,  one  with  P  sites  and  X  granules  at  each  site  and 
the  other  with  X  sites  and  P  granules  at  each  site,  produce  vastly  different  per¬ 
formance  results  yet  have  the  same  total  number  of  granules  in  the  database. 
These  two  systems  would  be  modelled  with  the  same  parameters  in  the  Lin 
model  since  his  model  only  considers  the  total  number  of  granules.  Lin  and 
Nolte  would  not  have  been  able  to  conclude  that  Basic  T/0  is  far  less  sensitive 
to  the  number  of  sites  in  the  network  than  Basic  2PL.  This  behavior  can  be 
observed  in  Figure  5.3.  The  third  problem  mentioned  above  can  have  an  impact 
on  performance  since  sites  must  wait  until  all  pre-commiits  are  acknowledged 
before  processing  can  continue.  This  waiting  time  is,  of  course,  dependent  on 
the  number  of  sites  to  which  the  pre-commit  is  sent. 

In  conclusion,  this  detailed  simulation  study  shows  that  Basic  T/0  is  better 
than  the  two  2PL  edgorithms  in  a  large  number  of  cases.  This  indicates  that  the 
Lin  model  may  be  leaving  out  too  much  detail  in  striving  for  simplicity  euid  ease 
of  implementation. 

Several  reasons  why  Basic  T/0  outperforms  Basic  2PL  with  Primary  Copy 
became  apparent  as  a  result  of  this  study.  The  first  is  that  Basic  2PL  with  Pri¬ 
mary  Copy  allows  slightly  less  pareillelism  than  Basic  T/0  because  the  first  pre¬ 
commit  must  be  transmitted  separately  to  the  primeiry  copy  site  and  processed 
first  before  the  other  pre-commits  can  be  sent  to  the  other  sites.  In  Basic  T/0, 
the  origination  site  can  send  out  all  the  pre-commits  in  parallel.  Removing  the 
primary  copies  to  increase  the  parallelism  at  the  cost  of  a  slightly  higher  pro¬ 
bability  of  deadlock  did  not  produce  significantly  better  results  and  Basic  T/0 
remained  the  desired  choice  of  algorithm. 

Another  reason  for  the  higher  throughput  of  Basic  T/0  is  that  if  a  transac¬ 
tion  is  to  be  rejected,  it  is  detected  almost  immediately  on  arrival  to  a  site. 
Deadlocks  are  not  detected  immediately;  however,  and  much  potential  comput¬ 
ing  opportunity  can  be  lost  in  the  delay. 


Chapter  Six 

Future  Directions  for  Modelling  Database  Management  Systems 

emd  ConclusiorEs 


The  major  goal  of  this  thesis  has  been  to  examine  the  effects  of  con¬ 
currency  control  on  the  performance  of  database  management  systems. 
Several  factors  affect  the  performance  of  database  management  systems  end 
the  choice  of  concurrency  control  mechanism  is  one  of  them. 

The  correct  choice  represents  a  tradeoff  between  improved  performance 
due  to  additional  concurrent  access  by  the  transactions  versus  the  overhead 
caused  by  the  concurrency  control  mechanism  itself.  Other  Issues  arise  that 
can  raise  the  allowable  concurrency  level,  but  there  is  a  cost  associated  with 
these  improvements.  For  example,  the  choice  of  granularity  is  a  parameter  of 
this  type.  If  the  granularity  is  fine,  then  there  is  a  possibility  for  greater  con¬ 
currency  among  transactions.  This  concurrency  is  obtained  at  the  cost  of  hav¬ 
ing  to  keep  track  of  information  (about  locks,  timestamps,  transactions,  etc.) 
for  each  granule. 

In  the  first  section  of  this  chapter  the  major  conclusions  from  Chapters  2 
and  3  are  reviewed.  In  the  following  section,  the  major  conclusions  from 
Chapters  4  and  5  are  stated.  Then,  finally,  several  areas  for  further  research 
are  suggested. 

6.1.  Summary  of  CoDolusioEis 

An  analytic  model  was  used  to  study  the  performance  of  locking  algorithms 
in  centralized  database  management  systems  and  a  framework  was  presented 
for  concurrency  control  mechanisms  in  distributed  database  management  sys¬ 
tems  so  that  their  performance  could  be  evaluated.  A  simulation  model  of  two 
concurrency  control  methods  was  used  to  verify  other  results  eind  to  compare 
their  performance  quantitatively. 

6.1.1.  Centralized  Databases 

In  a  centralized  database,  all  processing  takes  place  at  a  single  site.  A 
heuristic  analytic  model  was  derived  to  determine  the  optimum  locking  granu¬ 
larity  for  two  locking  policies.  The  technique  used  to  solve  the  model  was  to 
express  the  average  waiting  time  at  different  points  in  the  system  in  terms  of 
the  waiting  times  at  other  points  in  the  network.  Then,  successive  substitution 
was  used  to  solve  for  the  different  waiting  times.  Performance  measures  such 
as  throughput  and  average  response  time  could  then  be  expressed  in  terms  of 
these  average  waiting  times.  The  model  proved  to  be  inexpensive  to  run,  unlike 
previous  simulation  experiments  of  a  similar  nature. 

The  original  model  deals  with  a  system  where  locks  are  requested  all  at 
once  and  the  transactions  are  queued  FCFS  until  all  the  locks  are  obtained. 
Transactions  may  not  pass  each  other  in  line  so  deadlock  is  not  possible. 

The  second  model,  which  is  described  as  a  perturbation  of  the  original, 
models  an  algorithm  where  granules  are  requested  all  at  once;  however,  if  the 
transaction  is  blocked  on  any  of  its  requests  it  releases  all  the  granules  it 
requested  and  trys  again. 

Few,  if  any,  iterative  solution  techniques  of  this  type  have  been  shown  to 
converge.  The  original  model  used  in  this  thesis  has  been  shown  to  converge 
for  all  realistic  systems  and  a  test  for  whether  the  point  of  convergence  is 
unique  is  given. 


-95- 


Several  assumptions  were  made  in  the  original  model.  Some  of  these 
assumptions  are  relaxed  and  the  impact  of  the  assumptions  evaluated.  For 
example,  originally  the  assumption  was  made  that  the  transaction  chose  any 
particular  granule  with  equal  probability.  Bounds  on  the  performance  of  this 
nonuniform  model  were  also  derived.  The  bounds  were  obtained  by  solving  two 
related  systems  under  the  uniform  assumption.  The  results  of  the  nonuniform 
model  clearly  showed  that,  if  possible,  the  system  analyst  should  try  and 
arrange  the  granules  so  that  they  are  accessed  as  nearly  uniformly  as  possible 
by  each  transaction,  i.e.,  avoid  creating  a  bottleneck  granule. 

6. 1 . 2.  Distribu ted  Databases 

In  a  distributed  database  management  system,  the  database  activity  may 
take  place  at  many  sites  in  the  system.  In  addition,  the  concurrency  control 
mechanism  may  be  handled  at  one  site  or  it  may  be  distributed  among  the 
sites. 

Many  concurrency  control  solutions  have  been  proposed  for  distributed 
database  systems,  however  their  relative  performance  characteristics  are  not 
completely  understood.  A  framework  is  proposed  that  enables  the  system 
einalyst  to  view  an  algorithm  so  that  its  performance  can  be  studied.  Past  per¬ 
formance  studies  of  distributed  concurrency  control  mechanisms  have  not  con¬ 
sidered  workload  characteristics  or  system  constraints.  This  thesis  identifies 
many  different  criteria  that  can  help  the  system  designer  choose  a  particulsir 
CCM  from  the  many  that  exist. 

A  set  of  tables  is  constructed  so  that,  given  a  description  of  the  system, 
certain  algorithms  can  be  eliminated  that  cannot  operate  in  the  given  environ¬ 
ment.  The  tables  can  also  be  used  to  isolate  from  the  remaining  algorithms 
those  that  can  take  advantage  of  the  system  and/or  workload  characteristics. 
An  additional  table  is  provided  that  indicates  the  authors'  view  on  the  relative 
difficulty  of  implementation  for  the  various  algorithms. 

Two  particular  CCMs,  Basic  2PL  with  Primary  Copy  2PL  for  write  s3mchroni- 
zation  and  Basic  T/0  with  the  Thomas  Write  Rule  were  studied  in  detail  using  a 
simulation  model  to  reinforce  claims  made  using  the  framework.  Basic  T/0  was 
seen  to  outperform  Basic  2PL  in  almost  all  cases  tested. 

6.2.  Future  Directions 

The  results  of  the  analytic  study  presented  in  Chapters  2  and  3  suggest 
several  areas  for  further  study.  The  model  made  some  assumptions  that  might 
be  relaxed.  For  example,  it  was  assumed  that  the  number  of  granules  chosen 
by  transactions  was  fixed.  It  may  be  possible  to  relax  this  by  supposing  that  the 
transactions  choose  a  number  of  granules  that  is  picked  from  some  distribu¬ 
tion. 

It  is  also  hoped  that  the  technique  used,  based  on  that  of  Bard,  of  forming 
equations  that  can  be  solved  iteratively  will  prove  useful  for  a  variety  of  appli¬ 
cations  that  have  heretofore  been  difficult  to  model  analytically.  Also,  the 
techniques  used  to  show  a  bound  on  the  number  of  fixed  points  and  the  method 
used  to  find  and  test  for  a  unique  fixed  point  should  prove  useful  for  future 
approximate  iterative  analytic  models. 

In  the  distributed  database  area,  the  opportunity  exists  for  researchers  to 
have  input  into  some  of  the  design  decisions  that  are  being  made  by  industry 
because  there  are  not  many  distributed  database  systems  currently  operating. 
The  framework  and  cookbook  technique  for  choosing  a  CCM  described  in  this 
thesis  points  to  many  areas  for  further  research.  For  example,  it  would  be 
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interesting  to  pursue  the  question  of  the  tradeoff  between  the  increased  fre¬ 
quency  of  deadlock  and  the  availibility  of  primary  copies  for  the  Basic  2PL  and 
Basic  2PL  with  Primary  Copy  algorithms.  If  the  frequency  of  deadlock  is  small 
then  the  increased  parallelism  may  dictate  the  use  of  the  Basic  2PL  without 
Primary  Copies  algorithm.  Quantitative  results,  using  simulation  models  or 
other  techniques  will  be  necessary  to  be  able  to  make  choices  between  CCMs 
when  other  techniques  fail.  It  is  hoped  also  that  the  (subjective)  table  entries 
in  Tables  5.1  and  5.2  will  be  adjusted  and  refined  through  future  studies. 

Another  area  for  research  should  be  to  try  to  decide  on  a  standard  transac¬ 
tion  processing  model  so  that  results  of  different  authors  can  be  more  easily 
compared.  There  are  only  a  few  performance  studies  of  distributed  databases 
currently  in  print;  however,  despite  this,  some  of  them  are  not  readily  compar¬ 
able.  If  a  standard  transaction  processing  model  cannot  be  found,  then  a  study 
of  the  different  models,  their  differences  and  the  implications  of  their 
differences  on  performance  would  be  useful. 

In  summary,  this  dissertation  provides  insights  into  the  effects  of  con¬ 
currency  control  in  both  centralized  and  distributed  database  management  sys¬ 
tems.  TTie  results  of  this  thesis  can  be  used  to  guide  system  designers  and 
analysts  in  their  implementation  and  parameterization  of  database  manage¬ 
ment  systems. 
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^pendix  A 

Derivation  of  the  equation  for  X. 

Claim: 
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Proof:  From  equation  (5), 
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Summing  the  two  parts  gives  the  result. 
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Appcndix  B 

Each  of  the  operatioas  in  Figure  4.2  is  considered  sepeirately  for  each  algo¬ 
rithm. 


DM  sends  a  read  request  elsewhere 


Basic  2PL  with  Primary  Copy  2PL  for  write  synchronization 

send  a  read  request  to  the  site  from  which  data  is  to  be  read  and  await  the 
reply 

if  the  read  request  is  rejected  (due  to  deadlock),  then  restart  the  transac¬ 
tion 

Centralized  2PL 

send  a  read  request  to  the  central  site  euid  await  the  reply 

If  the  read  request  is  rejected  (due  to  deadlock),  then  restart  the  transac¬ 
tion 

Basic  T/O  with  the  Thomas  Write  Rule 

send  a  read  request  to  the  site  from  which  data  is  to  be  read  and  await  the 
reply 

if  the  read  request  is  rejected  then  restart  the  transaction 

Multi- version  T/0 

send  a  read  request  to  the  site  from  which  data  is  to  be  read  and  await  the 
reply 

Conservative  T/O 

-  If  no  read  request  or  pre-comniit  is  ready  then  send  a  null  operation 
request  to  each  site  every  x  seconds 

send  a  read  request  to  the  site  from  which  data  is  to  be  read  and  await  the 
reply 

Aggressive  T/O 

the  assumption  is  made  that  all  reading  is  done  locally  since  for  this  algo¬ 
rithm  the  database  must  be  fully  replicated 

Tickets 

if  a  ticket  is  not  available  then  wait  for  the  token  to  issue  more  tickets 
send  a  read  request  to  the  site  from  which  data  is  to  be  read  with  a  ticket 
number  and  await  the  reply 

Majority  Consensus 

the  assumption  is  made  that  all  reading  is  done  locally  since  for  this  algo¬ 
rithm  the  database  must  be  fully  replicated 
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Wait  or  IXe 

send  a  read  request  to  the  site  from  which  data  is  to  be  read  and  await  the 
reply 

if  the  request  is  rejected  then  restart  the  transaction 

Wound  or  Wait 

send  the  read  request  to  the  site  from  which  data  is  to  be  read  and  await 
the  reply 

if  the  request  is  rejected  then  restart  the  transaction 
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DM  receives  a  read  request  Crom  elsewhere 

Basic  2PL  with  F^imary  Copy  2PL  for  write  synchronisation 

if  the  lock  cannot  be  granted,  i.e.,  it  ia  owned  by  someone  else  for  write, 
then  place  the  read  request  in  a  lock  queue  for  the  desired  data  item 

if  there  is  a  deadlock  and  this  transaction  is  selected  to  be  rejected,  then 
send  the  reject  message  to  the  processing  site 

when  the  request  gets  to  the  head  of  the  line,  then  mark  the  data  item  as 
locked  for  read 

reply  to  the  processing  site  with  the  value  of  the  data  item 

Centralized  2PL 

At  the  central  site; 

if  the  lock  cannot  be  granted,  i.e.,  it  is  owned  by  someone  else  for 
Yn*ite,  then  place  the  read  request  in  a  lock  queue  for  the  desired  data 
item 

if  there  is  a  deadlock  and  this  treuisaction  is  selected  to  be  rejected, 
then  send  a  reject  message  to  the  processing  site 

when  a  request  gets  to  the  head  of  the  line,  then  mark  the  data  item 
as  locked  for  read 

reply  to  the  processing  site  with  the  value  of  the  data  item 

Basic  T/0  with  the  Thomas  Write  Rule 

if  the  timestamp  of  the  read  request  is  greater  than  the  write  timestamp 
of  the  data  item  then  accept  it  else  reject  it 

send  a  message  back  to  processing  site  with  the  reject  message  or  the 
value  for  data  Item 

update  the  read  timestamp  of  the  data  item  if  the  read  request  was 
accepted 

Multi- Version  T/0 

send  the  value  of  the  version  of  the  data  item  with  the  largest  timestamp 
less  than  read  request  to  the  processing  site 

update  the  read  timestamp  of  the  data  item 

Conservative  T/0 

place  the  read  request  in  the  read  queue  associated  with  the  site  that  sent 
read  request 

if  any  pair  of  queues  are  empty  then  wait 
process  the  request  with  the  lowest  timestamp 

if  request  is  a  read,  then  send  the  value  of  the  data  item  back  to  the  pro¬ 
cessing  site 

if  the  request  is  a  pre-commit,  then  write  the  data  items  to  secondary 
storage  and  if  all  is  ok.  then  send  an  accept  message  to  the  processing  site, 
else  if  the  write  was  unsuccessful  then  send  a  reject  message  to  the  pro¬ 
cessing  site 
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if  the  request  is  a  pre-commit  then  wait  for  the  associated  update  request 
to  either  arrive  or  get  restarted 

Aggressive  T/0 

the  assumption  is  made  that  all  reading  is  done  locally  since  for  this  algo¬ 
rithm  the  database  must  be  fully  replicated 

Tickets 

wait  for  requests  with  lower  ticket  numbers 

send  the  value  of  the  data  item  to  the  processing  site 

Uegcrity  Consensus 

the  assumption  is  made  that  all  reading  is  done  locally  since  for  this  algo¬ 
rithm  the  database  must  be  fully  replicated 

Wait  or  Die 

if  there  is  a  conflict  and  the  request  is  before  the  ones  it  conflicts  with, 
then  wait  until  all  the  requests  it  conflicts  with  terminate  or  are  rejected 
else  if  the  request  is  after  the  ones  it  conflicts  with  then  reject  the  request 

send  a  reject  message  or  the  value  of  the  data  item  to  the  processing  site 

Wound  or  Wait 

if  there  is  a  conflict  and  the  request  is  before  one  of  the  requests  it 
conflicts  with  then  restart  the  ones  it  conflicts  with  if  the  one  it  conflicts 
with  has  not  sent  their  update  requests,  else  wait  until  the  conflicting 
requests  terminate  or  are  rejected 

send  a  reject  message  or  the  value  of  the  data  item  to  the  processing  site 
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DM  sends  a  pre- commit  elsewhere 

Basic  2PL  with  Primary  Copy  2PL  for  write  synchronization 

send  a  pre-commit  to  the  primao'y  copy  site  and  await  the  reply 

if  it  IS  rejected  (due  to  site  failure  or  deadlock),  then  restart  the  transac¬ 
tion 

Centralized  2PL 

send  a  pre-commit  to  the  central  site  and  await  the  reply 

if  it  is  rejected  (due  to  site  failure  or  deadlock),  then  restart  the  transac¬ 
tion 

Basic  T/0  with  the  Thomas  Write  Rule 

send  pre-commits  to  the  destination  sites  and  await  the  replies 

if  any  of  the  sites  rejected  the  request  then  send  messages  to  the  other 
sites  to  inform  them  to  cancel  the  pre-commit  and  restart  the  transaction 

Multi-Version  T/0 

send  pre-commits  to  the  destination  sites  and  await  replies 

if  any  of  the  destination  sites  reject  the  request  then  send  messages  to  the 
other  sites  to  inform  them  to  cancel  the  pre-commit  and  then  restart  the 
transaction 

Conservative  T/0 

if  no  read  request  or  pre-commit  is  ready  then  send  null  operation 
requests  every  x  seconds 

send  pre-commits  to  the  destination  sites  and  await  the  replies 

Aggressive  T/0 

pre-commits  are  not  necessary  in  this  algorithm  because  of  the  ability  to 
undo  transactions 

Tickets 

if  a  ticket  is  not  available  then  wait  for  the  token  to  issue  more  tickets 
send  pre-commits  to  the  destination  sites,  each  with  a  ticket 

Majority  Consensus 

send  a  pre-commit  request  to  another  site  which  has  not  yet  voted  on  this 
request 

Wait  for  a  reply  that  a  majority  has  been  reached  or  that  a  majority  cannot 
be  reached 

if  a  majority  cannot  be  reached  then  inform  all  other  sites  that  have  voted 
so  that  they  can  reconsider  deferred  requests 

Wait  or  Die 

send  pre-commits  to  the  destination  sites  and  await  the  replies 
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If  it  is  rejected  then  restart  the  transaction 

Wound  or  Wait 

send  pre-commits  to  the  destination  sites  and  await  the  replies 
if  it  is  rejected  then  restart  the  transaction 
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DU  receives  a  pre-commit  from  elsewhere 

Basic  2PL  with  Primary  Copy  2PL  for  write  synchronization 

At  the  Primary  Copy  Site; 

if  the  lock  cannot  be  granted  then  place  pre-commit  in  a  lock  queue 
for  the  desired  item 

if  there  is  a  deadlock  and  this  request  is  selected  to  be  rejected,  then 
send  a  reject  message  to  the  processing  site 

when  it  gets  to  the  head  of  the  line,  then  mark  the  data  item  as  locked 
for  write 

write  the  data  items  to  secondary  storage 

if  writing  data  to  secondary  storage  failed,  then  reject  the  request  and 
inform  the  processing  site 

send  pre-commits  to  the  destination  sites  and  await  the  replies 

if  pre-commits  are  not  rejected,  then  send  an  acknowledgement  to  the 
processing  site 

if  a  pre-commit  is  rejected  (due  to  site  failure  or  deadlock),  then  send 
a  reject  message  to  the  processing  site 

wait  for  the  associated  update  request  to  either  arrive  or  get  restarted 
At  the  other  sites; 

write  the  data  items  to  secondary  storage 

if  all  is  ok  then  send  an  acknowledgment  to  the  primary  copy  site  else 
send  a  reject  message  to  the  primary  copy  site 

wait  for  the  associated  update  request  to  either  arrive  or  get  restarted 

C^tralized  2PL 

At  the  central  site; 

if  the  lock  cannot  be  granted,  then  place  pre-commit  in  a  lock 
queue  for  the  desired  item 

if  there  is  a  deadlock  and  this  transaction  is  selected  to  be 
rejected,  then  send  a  reject  message  to  the  processing  site 

when  it  gets  to  the  head  of  the  line,  then  mark  data  item  as 
locked  for  write 

write  the  data  items  to  secondary  storage 

If  writing  data  to  secondary  storage  fails  then  send  a  reject  mes¬ 
sage  to  the  processing  site 

if  writing  data  to  secondary  storage  does  not  fail  then; 

send  pre-commits  to  the  other  sites  and  await  the  replies 
if  pre-commits  are  not  rejected,  then  send  an  acknowledg¬ 
ment  to  the  processing  site 

if  pre-commits  are  rejected  (due  to  site  failure),  then  send  a 
reject  message  to  the  processing  site 

At  the  other  sites; 

write  the  data  items  to  secondeiry  storage 
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if  all  is  ok,  then  send  an  acknowledgement  else  send  a  reject  mes¬ 
sage  back  to  the  central  site 

wait  for  the  associated  update  request  to  either  arrive  or  get  res¬ 
tarted 

Basic  T/0  ifith  the  Thomas  Write  Rule 

if  the  timestamp  of  the  pre-commit  is  greater  than  the  read  times¬ 
tamp  of  the  data  item  then  accept  it  else  reject  it 

if  it  was  accepted  and  if  the  timestamp  is  greater  than  the  write 
timestamp  of  the  data  item,  then  write  the  data  items  to  secondary 
storage.  Otherwise,  ignore  the  request 

send  a  message  to  the  processing  site  with  either  acceptance  or  rejec¬ 
tion 

queue  all  read  requests  that  have  timestamps  greater  than  the  pre¬ 
commit  timestamp  until  associated  update  request  arrives 

Multi-Versioa  T/0 

if  transaction  is  writing  data  that  should  have  been  read  by  another 
transaction  the  reject  it  else  accept  it 

if  accepted  then  yrrite  the  data  items  to  secondary  storage 

send  a  message  to  the  processing  site  with  either  acceptance  or  rejec¬ 
tion 

queue  all  read  requests  until  the  associated  update  request  arrives  to 
either  arrive  or  get  restarted 

Conservative  T/0 

place  the  pre-commit  in  pre-commit  request  queue  associated  with 
the  site  that  sent  the  pre-commit 

if  any  pair  of  queues  is  empty  then  wait 

process  the  request  with  the  lowest  timestamp 

if  the  request  is  a  read  then  send  the  value  of  the  data  item  to  the  pro¬ 
cessing  site 

if  the  request  is  a  pre-commit  then  write  the  data  items  to  secondary 
storage  and  if  all  is  ok  then  send  an  accept  message  to  the  processing 
site  else  if  write  was  unsuccessful  then  send  a  reject  message  to  the 
processing  site 

-  if  request  is  a  pre-commit  then  wait  for  associated  update  request  to 
either  arrive  or  get  restarted 

Aggressive  T/0 

pre-commits  are  not  necessary  in  this  algorithm  because  of  the  ability 
to  undo  transactions 

Tickets 

write  the  data  items  to  secondary  storage 

if  oil  is  not  ok  then  send  a  reject  message  to  the  processing  site 
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wait  for  the  associated  update  request  to  either  arrive  or  get  restarted 

UajoriLy  Consensus 

write  the  data  items  to  secondary  storage 

if  all  is  not  ok  then  send  a  reject  message  to  the  processing  site 

if  the  request  has  already  been  voted  on  then  vote  the  same  way  as 
before  on  the  request 

if  any  of  the  base  variables  (read  timestamps)  are  obsolete  then  vote 
to  reject 

else  if  each  baise  variable  (read  timestamp)  is  current  and  the 
request  does  not  conflict  with  any  pending  requests  then  vote  to 
accept  and  mark  the  request  pending 

else  if  each  base  variable  is  current  but  the  request  conflicts  with 
a  pending  request  of  higher  priority  then  vote  to  pass 

else  defer  voting 

wait  for  the  associated  update  request  to  either  arrive  or  get  restarted 

Wait  or  Die 

if  there  is  a  conflict  and  the  request  is  before  all  the  ones  it  conflicted 
with  then  wait  until  all  the  requests  it  conflicts  with  are  either  ter¬ 
minated  or  rejected  else  reject  the  request 

if  it  is  rejected  then  send  message  back  to  the  processing  site  else 
write  the  data  items  to  secondary  storage  and  send  an  acknowledge¬ 
ment  to  the  processing  site 

wait  for  the  associated  update  request  to  either  arrive  or  get  restarted 

Wound  or  Wait 

if  there  is  a  conflict  and  the  request  is  before  all  the  ones  it  conflicted 
with  then  restart  the  requests  it  conflicts  with  if  they  have  not  ini¬ 
tiated  termination  else  wait  until  the  ones  it  conflicts  with  are  either 
terminated  or  rejected 

if  it  is  rejected  then  send  a  message  back  to  the  processing  site  else 
write  the  data  items  to  secondary  storage  and  send  an  acknowledge¬ 
ment  to  the  processing  site 

wait  for  the  associated  update  request  to  either  arrive  or  get  restarted 
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DM  sends  an  update  request  elsewhere 

Basic  2PL  with  Primary  Copy  2PL  for  write  synchronization 

send  update  requests  to  the  destination  sites 

send  messages  to  all  sites  that  were  locked  for  read  but  not  write  to 
release  read  locks 

Centralized  2PL 

send  update  requests  to  the  destination  sites 

Basic  T/0  with  the  Thomas  Write  Rule 

send  update  requests  to  the  destination  sites 

Multi- Version  T/0 

send  update  requests  to  the  destination  sites 

Conservative  T/0 

send  update  requests  to  the  destination  sites 

Aggressive  T/0 

send  update  requests  to  the  destination  sites 

Tickets 

send  update  requests  to  the  destination  sites 

Majority  Consensus 

send  update  requests  to  the  destination  sites 

Wait  or  Die 

send  update  requests  to  the  destination  sites 

Wound  or  Wait 

send  update  requests  to  the  destination  sites 
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DM  receives  an  update  request 

Basic  2PL  with  Primary  Copy  2PL  for  write  synchronization 

copy  the  information  from  secondary  storage  to  the  database 
release  all  the  locks  that  are  held  for  this  transaction 

Centralized  2PL 

At  the  central  site; 

copy  the  information  from  secondary  storage  to  the  database 
release  all  the  locks  that  are  held  for  this  transaction 
At  the  other  sites; 

copy  the  information  from  secondary  storage  to  the  database 

Basic  T/0  with  the  Tliomas  Write  Rule 

copy  the  information  from  secondary  storage  to  the  database 
update  the  write  timestamps  of  the  data  items 

process  the  queue  of  requests  which  were  wedting  for  update  request 
to  arrive  as  if  they  are  just  arriving 

Multi- Version  T/0 

copy  the  information  from  secondary  storage  to  database  along  with 
the  timestamp  of  the  pre-commit 

update  the  write  timestamp  of  the  data  items 

process  the  queue  of  requests  which  were  waiting  for  the  update 
request  to  arrive 

Conservative  T/0 

copy  the  information  from  secondary  storage  to  database 

Aggressive  T/0 

wait  for  the  maximum  of  R  time  units  after  origination  and  the  time  of 
arrival  of  the  request  at  the  destination  site 

if  received  request  can  be  applied  without  creating  a  conflict  with 
some  previously  received  request,  then  apply  request 

if  not,  then  consider  the  full  knowledge  of  request  histories  to  date. 

and  reconsider  all  the  rejection  decisions 

reapply  all  the  accepted  transactions  in  timestamp  order 

Tickets 

copy  the  information  from  secondary  storage  to  the  database 

Majority  Consensus 

reject  all  the  conflicting  deferred  requests 

copy  the  information  from  secondary  storage  to  the  database 
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Wait  or  Die 

copy  the  information  from  secondary  storage  to  the  database 
release  all  the  locks  that  are  held  for  this  transaction 

Wound  or  Wadt 

copy  the  information  from  secondary  storage  to  the  database 
release  all  the  locks  that  are  held  for  this  transaction 
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Appendix  C 

Claim;  Basic  170  with  the  Thomas  Write  Rule  has  at  least  as  many  rejections  as 
Multi-Version  T/0. 

Argument:  The  claim  is  substeintiated  by  showing  that  any  sequence  of  requests 
that  leads  to  a  request  being  rejected  under  Multi-Version  T/0  will  also  lead  to  a 
rejection  under  Basic  T/0  with  the  Thomas  Write  Rule.  In  addition,  it  will  be 
shown  that  there  are  instances  where  rejections  can  occur  under  Basic  T/0 
with  the  Thomas  Write  Rule  that  would  not  occur  under  Multi-Version  T/0. 

The  claim  is  trivially  true  for  read  requests  as  there  can  be  no  rejected  reads 
under  Multi-Version  T/0,  while  the  rejection  of  a  read  request  is  possible  under 
Basic  T/0  with  the  Thomas  Write  Rule. 

Note  that  if  a  request  is  rejected  under  Multi-Version  T/0  then  there  must 
have  been  a  read  timestamp  for  the  data  item  that  lies  between  the  timestamp 
of  the  request  and  the  timestamp  of  the  write  request  that  next  wrote  the  data 
item.  This  means  that  a  read  request  with  a  later  timestamp  has  already  been 
processed  for  the  data  item.  The  arriving  request  will  also  be  rejected  under 
Basic  T/0  with  the  Thomas  Write  Rule  because  the  request's  timestamp  is 
necessarily  less  than  or  equal  to  the  read  timesteimp  of  the  data  item.  This 
shows  that  if  a  request  is  to  be  rejected  under  Multi-Version  T/0  then  it  will  also 
be  rejected  under  Basic  T/0  with  the  Thomas  Write  Rule.  The  converse  is  not 
true.  A  simple  example  will  show  a  case  where  Basic  T/0  with  the  Thomas  Write 
Rule  will  reject  a  request  that  Multi-Version  T/0  accepts.  Suppose  the  site  has 
received  a  request  with  a  timestamp  T  and  the  site  has  already  processed  an 
update  (write)  with  timestamp  T-1,  emother  update  with  timestamp  T-t-l,  eind  a 
read  request  with  timestamp  T+2,  all  for  the  data  item  in  question.  Figure  C.l 
shows  a  time  line  for  the  example  where  w  stands  for  a  write  request  (or  pre¬ 
commit)  and  r  stands  for  a  read  request.  Recall  that  the  arrival  order  at  the 
site  of  these  requests  is:  w  with  timestamp  T-1  followed  by  w  with  timestamp 
T-f-l  followed  by  r  with  timestamp  T-i-2  followed  by  the  w  request  with  timestamp 
T.  When  the  pre-commlt  with  timestamp  T  arrives,  it  will  be  rejected  if  Basic 
T/0  with  the  Thomas  Write  Rule  is  being  used  because  the  request’s  timestamp 
is  less  them  the  read  timestamp  of  the  data  item,  namely  T-(-2.  Under  Multi- 
Version  T/0,  the  pre-commit  can  and  will  be  accepted.  That  completes  the  argu¬ 
ment. 
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5.0 

6 

3 

10 

14 

0.51 

0.52 

1.1 

1.32 

1.39 

5.4 

6 

3 

10 

18 

0.43 

.Q.4a  .. 

-0.2 

-.1-28. 

.  j-ai 

2.4 

6 

3 

10 

22 

0.98 

0.91 

-7.0 

5.32 

5.98 

12.4 

6 

1 

10 

5 

1.18 

1.11 

-6.0 

3.54 

4.02 

13.4 

6 

2 

10 

5 

1.33 

1.31 

-1.8 

2.60 

2.65 

2.0 

6 

3 

10 

7=1.  S[l]  =  1.0.  S[2]  =  .80.  S[3]  =  .60 


Table  II- 1 
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■ 

rbrogghput _ 

HoiB  in  System 

Parameters 

1 

XikllereDce 

ahmilution 

Analytic 

XDilfereiice 

M 

m) 

N 

z 

1.01 

0.93 

-7.5 

4.90 

5.70 

16.3 

4 

3 

10 

5 

1.07 

0.86 

-19.7 

4.30 

6.64 

54.5 

6 

3 

10 

5 

1.14 

1.03 

-9.9 

3.76 

4.74 

26.0 

8 

3 

10 

5 

1.17 

1.16 

-1.0 

3.53 

3.63 

2.9 

10 

3 

10 

5 

0.62 

0.61 

-0.9 

1.57 

1.51 

-3.9 

6 

3 

4 

5 

1.11 

0.89 

-19.4 

5.82 

8.42 

44.7 

6 

3 

12 

5 

1.11 

1.06 

-4.4 

12.91 

13.85 

7.3 

6 

3 

20 

5 

1.11 

1.09 

-1.6 

7.03 

7.15 

1.7 

6 

3 

10 

2 

1.04 

0.82 

-21.3 

3.56 

6J22 

74.6 

6 

3 

10 

6 

0.82 

0.75 

-8.4 

2.43 

3.31 

36.3 

6 

3 

10 

10 

0.63 

0.62 

-1.0 

1.81 

2.04 

12.6 

6 

3 

10 

14 

0-61 

0.51 

-0.2 

1.56 

1.65 

6.0 

6 

3 

10 

18 

0.43 

0.43 

-0.9 

1.56 

1.48 

-5.3 

6 

3 

10 

22 

0.98 

1.09 

11.2 

5.16 

4.15 

-19.6 

6 

1 

10 

5 

1.08 

1.08 

-0.5 

4.28 

4.30 

0.5 

6 

2 

10 

5 

1.07 

0.86 

-19.7 

4.24 

6.64 

56.7 

6 

3 

10 

5 

7  =  2.  S[l]  =  1.0.  S[2]  =  .80.  S[3]  =  .60 
Table  n-2 


1 

TiroufihiHit 

Hme  in  Sys 

Lem 

Parameters 

SHmilatign 

Analytic 

XlMczsiiix 

amulation 

Analytic 

XDifference 

£— ■ 

ml 

N 

7, 

1.01 

1.11 

9.9 

4.90 

4.00 

-18.4 

4 

3 

10 

5 

1.07 

0.96 

-10.1 

4.30 

5.40 

25.5 

6 

3 

10 

5 

1.14 

1.04 

-8.9 

3.76 

4.63 

23.1 

8 

3 

10 

5 

1.17 

0.89 

-24.1 

3.53 

6.27 

77.6 

10 

3 

10 

5 

0.62 

0.61 

-1.6 

1.57 

1.56 

0.6 

6 

3 

4 

5 

1.11 

1.06 

-4.6 

5.82 

6.33 

8.8 

6 

3 

12 

5 

1.11 

1.31 

17.8 

12.91 

10.30 

-20.2 

6 

3 

20 

5 

1.11 

1.38 

24.5 

7.03 

6.24 

-26.5 

6 

3 

10 

2 

1.04 

0.88 

-16.7 

3.56 

5.40 

51.7 

6 

3 

10 

6 

0.81 

0.68 

-16.4 

2.43 

4.76 

96.0 

6 

3 

10 

10 

0.63 

0.60 

-5.1 

1.81 

2.72 

50.3 

6 

3 

10 

14 

0.51 

0.50 

-1.4 

1.56 

1.89 

21-2 

6 

3 

10 

18 

0.43 

0.42 

-1.5 

1.56 

1.61 

3.1 

6 

3 

10 

22 

0.98 

1.22 

24.5 

5.16 

3.17 

-38.6 

6 

1 

10 

5 

1.08 

0.96 

-11.0 

4J38 

5.40 

26.1 

6 

2 

10 

5 

1.07 

0.96 

-104 _ 

4J24 

5.40 

27.3 

6 

3 

10 

5 

7  =  3.  S[l]  =  1.0,  S[2]  =  .80.  S[3]  =  .60 
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t 

1 

Tinie  in  Svsl 

tern 

Parameters 

Smulation 

Analrtic 

%  E&fference 

Smulatifln 

Anabrtic 

XOiOercpce 

...c  .. 

ml 

N 

z 

0.98 

0.85 

-13.0 

5.14 

6.73 

30.9 

4 

3 

10 

5 

0.98 

0.90 

-7.8 

5.18 

6.06 

17.0 

6 

3 

10 

5 

0.98 

0.92 

-6.3 

5.13 

5.89 

14.8 

8 

3 

10 

5 

1-00 

0.92 

-7.6 

5-08 

-  SJBSt  . 

14-7 

10 

3 

10 

5 

0.60 

0.59 

-1.4 

1.64 

1.76 

7.5 

6 

3 

4 

5 

1.01 

0.88 

-12.9 

6.89 

8.64 

26.3 

6 

3 

12 

5 

0.99 

0.78 

-21.2 

15.09 

20.70 

37.2 

6 

3 

20 

5 

1.00 

0.91 

-8.8 

7.96 

8.96 

12.6 

6 

3 

10 

2 

0.96 

0.88 

-7.8 

4.47 

5.30 

18.6 

6 

3 

10 

6 

a79 

0.76 

-4.4 

2.74 

3.24 

18.4 

6 

3 

10 

10 

0.61 

0.61 

0.7 

2.06 

2.28 

10.6 

6 

3 

10 

14 

0.51 

0.50 

-1J2 

1.76 

1.85 

5.4 

6 

3 

10 

18 

0.42 

0.42 

0.8 

1.63 

1.62 

-0.5 

6 

3 

10 

22 

0.98 

0.91 

-7.0 

5.20 

5.98 

15.0 

6 

1 

10 

5 

0.99 

0.91 

-8.1 

5.09 

5.99 

17.6 

6 

2 

10 

5 

0.98 

0.90 

-7.8 

5.14 

6.06 

17.9 

6 

3 

10 

5 

7  =  1.  S[l]  =  1.0,  S[2]  =  1.0.  S[3]  =  1.0 
TaWe  m-1 


1 

nbrou£[hi>ut 

Tline  in  System 

Parameters 

Shnulatkm 

Anabtio  ^ 

X  DHFerence 

SmulatkHi 

Analytic 

%niflereTM!e 

e . 

ml 

w 

7. 

0.97 

0.90 

-7.3 

5.30 

6.12 

15.5 

4 

3 

10 

5 

1.00 

0.75 

-24.8 

5.05 

8.30 

64.4 

6 

3 

10 

5 

0.99 

0.82 

-17.6 

5.10 

7.27 

42.5 

8 

3 

10 

5 

0-98 

0.88 

-9-8 

5.24 

6.31 

20.5 

10 

3 

10 

3 

0.61 

0.60 

-2.2 

1.62 

1.70 

5.1 

6 

3 

4 

5 

0.98 

0.79 

-19.6 

7.20 

10.24 

42.2 

6 

3 

12 

5 

1.00 

0.92 

-7.9 

14.80 

16.70 

12.9 

6 

3 

20 

3 

0.99 

0.94 

-5.0 

8.07 

8.63 

7.0 

6 

3 

10 

2 

a9e 

0.71 

-25.6 

4.41 

8.00 

81.3 

6 

3 

10 

6 

0.78 

0.65 

-16.4 

2.71 

5.34 

97.2 

6 

3 

10 

10 

a63 

0.60 

-4.8 

2.06 

2.68 

30.0 

6 

3 

10 

14 

0.50 

0.50 

0-3 

1.79 

1.94 

8.5 

6 

3 

10 

18 

0.42 

0-42 

0-7 

.  1.56  . 

1.6S 

5.9 

6 

3 

10 

22 

0.96 

1.09 

las 

5.32 

4.15 

-22.0 

6 

1 

10 

5 

a99 

0.97 

-1.6 

5.23 

5.27 

0.7 

6 

2 

10 

5 

1.00 

0.75 

-24.8 

5.05 

8.30 

64.4 

6 

3 

10 

5 

7  =  2.  S[l]  =  1.0,  S[2]  =  1.0,  S[3]  =  1.0 
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-  117- 


f 

rhrouehDut 

Unie  in  Sysl 

Lem 

Parameters 

ffimulatipn 

Aoahrtic  . 

XOIfenaiCifiL- 

aiTTnilHtim 

Anahrtic 

%Diflereiice 

B. 

ml 

N 

z 

0.98 

1.11 

13.3 

5.25 

4.00 

23.8 

4 

3 

10 

5 

0.99 

0.95 

-3.8 

5.06 

5.51 

8.8 

6 

3 

10 

5 

0.98 

1.00 

2.3 

5.27 

4.97 

-5.7 

8 

3 

10 

5 

0l97 

0.82 

-15.2  

.  5.30 

.2.16 

35.1 

10 

3 

10 

5 

0.58 

0.61 

4.8 

1.69 

1.58 

-6.5 

6 

3 

4 

5 

1.00 

1.05 

4.7 

6.96 

6.46 

-7.2 

6 

3 

12 

5 

1.00 

1.29 

28.9 

14.81 

10.51 

-29.0 

6 

3 

20 

5 

1.00 

1.36 

36.2 

8.02 

5.34 

-33.4 

6 

3 

10 

2 

0.95 

0.87 

-8.6 

4.49 

5.51 

22.7 

6 

3 

10 

6 

0.78 

0.67 

-14.1 

2.74 

4.92 

79,7 

6 

3 

10 

10 

0.64 

0.59 

-7.3 

2.02 

2.86 

41.5 

6 

3 

10 

14 

0.50 

0.50 

0.3 

1.73 

1.94 

12.0 

6 

3 

10 

18 

0.41 

0.42 

3J2 

1.52 

1.63 

7.5 

6 

3 

10 

22 

0.98 

1.09 

11.2 

5.17 

4.16 

-19.6 

6 

1 

10 

5 

0.96 

0-95 

-0.8 

5.34 

5.51 

3.1 

6 

2 

10 

5 

0.99 

0.95 

-3.8 

5.06 

5.51 

8.8 

6 

3 

10 

5 

7  =  3.  S[l]  =  1.0.  S[2]  =  1.0.  S[3]  =  1.0 
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^penjdix  £ 


Parameters 

Basic  T/ 

0  with  the  Thomas  Write  Rule 

Basic  2PL  with  Primary  Codv 

mos 

HTL 

ThranarliiMit 

AveHeavUnK  .  . 

Tlinniidinxl 

Are  Beao  Time 

4!.QtdMdlodg 

2 

10 

2 

1 

1 

.013 

297 

15 

.007 

418 

7 

2 

20 

2 

1 

1 

.013 

306 

4 

.010 

372 

0 

3 

10 

2 

1 

1 

.015 

364 

19 

.008 

673 

8 

4 

10 

2 

1 

1 

.017 

409 

41 

.007 

693 

12 

5 

10 

2 

1 

1 

.010 

443 

54 

.004 

815 

27 

2 

10 

2 

1 

1 

.013 

297 

15 

.007 

416 

7 

2 

10 

3 

1 

1 

.018 

296 

30 

j007 

483 

17 

2 

10 

4 

1 

1 

.022 

313 

46 

.007 

527 

27 

2 

10 

5 

1 

1 

.028 

299 

81 

.008 

552 

34 

2 

10 

8 

1 

1 

.038 

300 

203 

.005 

595 

61 

2 

10 

10 

1 

1 

.045 

291 

285 

.004 

557 

101 

Table  £.1  -  Load- Independent 


NOLR=  1 
NOPC=  1 
MULT  =  4 
1JIS=  60 
PCS=  100 
UPS  =  60 
LENGTH  =  10000 


Parameters 

Basic  T/i 

0  with  the  Thomas  Write  Rule 

Basic  I2PL  with  Primyarv  Codv 

ms 

HxatA 

in. 

IWIMT 

(XHES 

TlirM^ligHdL 

Are  Be^  lime 

— Ijtfnawtiniw 

ThnaigliimL 

Are:&qp'Gine 

#  erf  deadlofiH 

2 

10 

2 

1 

1 

.013 

297 

15 

.007 

373 

20 

2 

20 

2 

1 

1 

.013 

306 

4 

.009 

395 

8 

3 

10 

2 

1 

1 

.015 

364 

19 

.007 

464 

26 

4 

10 

2 

1 

1 

.017 

409 

41 

.007 

556 

41 

5 

10 

2 

1 

1 

.019 

443 

54 

.005 

663 

57 

6 

10 

2 

1 

1 

.021 

467 

71 

.005 

715 

66 

2 

10 

2 

1 

1 

.013 

297 

15 

.007 

373 

20 

2 

10 

3 

1 

1 

.018 

296 

30 

.009 

402 

57 

2 

10 

4 

1 

1 

.022 

313 

48 

.009 

387 

87 

2 

10 

5 

1 

1 

.028 

290 

81 

.007 

410 

161 

2 

10 

8 

1 

1 

.038 

300 

203 

.006 

412 

212 

2 

10 

10 

1 

1 

.045 

291 

285 

.006 

412 

212 

TaUe  E.2  -  Load-Independent 

NOLR=  1 
N0PC=  1 
MULT  =  2 
LRS=  60 
PCS  = 100 
UPS  =  60 
IJ5NGTH  =  10000 
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; 


—  -  — j 

,  Parame 

BamT/ 

5  wiUi  Uw  Thoi 

poas  Write  Rule 

Basic  2PL  with  Pnmarr  Caav 

JUL 

Mvm 

■nil 

2 

10 

2 

1 

1 

.000 

417 

10 

.006 

507 

6 

2 

20 

2 

1 

1 

.010 

403 

4 

.007 

466 

2 

2 

SO 

2 

1 

1 

.000 

407 

3 

.007 

474 

1 

2 

40 

2 

1 

1 

.010 

305 

3 

.006 

485 

1 

3 

10 

2 

1 

1 

.010 

513 

10 

.006 

695 

10 

4 

10 

2 

1 

1 

.012 

605 

22 

jooe 

860 

17 

6 

5 

2 

1 

1 

.012 

665 

68 

JOOl 

1105 

39 

2 

10 

2 

1 

1 

.000 

417 

10 

.006 

507 

6 

2 

10 

3 

1 

1 

.012 

456 

19 

X)07 

575 

13 

2 

10 

4 

1 

1 

.014 

402 

34 

.006 

630 

24 

2 

10 

5 

1 

1 

.015 

511 

55 

.006 

613 

40 

2 

10 

B 

1 

1 

.010 

601 

110 

.005 

658 

73 

2 

10 

10 

1 

1 

.020 

656 

145 

.004 

666 

96 

Table  E.3  -  Load-Dependent 


NOLR=  1 
NOPC  =  1 
MULT  =  4 
LENGTH  =  10000 


Parameters 

Basic  T/ 

D  with  the  Thonsis  Write  Rule 

Basic  2PL  with  Primary  Copy 

ns 

HDCBk 

■FL 

JIBUK 

am 

nma^glqBL 

AntBawlliDB 

#  flf  rejartimui 

ThnmjmiL 

2 

10 

2 

1 

10 

.000 

422 

11 

.006 

572 

5 

2 

20 

2 

1 

10 

.009 

421 

6 

.007 

531 

2 

3 

10 

2 

1 

10 

.010 

S2B 

18 

.005 

718 

11 

4 

10 

2 

1 

10 

.011 

615 

26 

.005 

062 

15 

6 

5 

2 

1 

10 

.012 

695 

97 

.000 

1155 

31 

2 

10 

2 

1 

10 

.009 

422 

11 

.006 

572 

5 

2 

10 

3 

1 

10 

.011 

456 

24 

j006 

750 

20 

2 

10 

4 

1 

10 

.014 

489 

38 

.007 

691 

16 

2 

10 

5 

1 

10 

.015 

516 

46 

.006 

759 

29 

2 

10 

6 

1 

10 

.017 

622 

116 

.004 

735 

70 

2 

10 

10 

1 

10 

.020 

667 

150 

.004 

729 

79 

Table  E.4  -  Load-Dependent 


N(MJl=  1 
NOPC  =  1 
MULT  =  4 
LENGTH  =  10000 
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Parameters 

Basic  T/ 

D  with  the  Thomas  Write  Rule 

Basic  2PL  with  Primarv  Conv 

ms 

HOGBA 

in. 

THINK 

cons 

HinnidiDiil 

Are  Rean  Time 

«  of  leiecUone 

nmndiDiil 

Are  BesoTiine 

4t  of  deadloda 

2 

10 

2 

100 

10 

.005 

630 

20 

.003 

1048 

6 

2 

20 

2 

100 

10 

.006 

625 

0 

.004 

032 

1 

3 

10 

2 

100 

10 

.008 

810 

20 

JOOZ 

1480 

13 

4 

10 

2 

100 

10 

.007 

030 

50 

jooe 

1778 

15 

6 

5 

2 

100 

10 

.007 

1077 

172 

.000 

2385 

44 

2 

10 

2 

100 

10 

.005 

630 

20 

j003 

1048 

6 

2 

10 

3 

100 

10 

.008 

857 

48 

.003 

1167 

22 

2 

10 

4 

100 

10 

.000 

676 

78 

.002 

1414 

04 

2 

10 

5 

100 

10 

.011 

701 

105 

jOOI 

1507 

127 

2 

10 

8 

100 

10 

.014 

774 

235 

j002 

860 

0 

2 

10 

10 

100 

10 

.016 

817 

344 

.002 

826 

0 

Table  E.5  -  Load-Dependent 


NOLR=  1 


NOPC  =  1 
MULT  =  4 
1£NGTH  =  3 


I  It  I 


Parameters 

Basic  T/ 

D  with  the  Thomas  Write  Rule 

Basic  2P1j  with  Prii] 

nary  Cody 

ms 

mcBA 

■n. 

THINK 

cons 

ThraaghaiL 

|.Bf  reiBrtiani 

JtoBBltolL 

2 

20 

1 

100 

10 

.OM 

355 

1 

i)03 

452 

1 

2 

20 

2 

100 

10 

.008 

406 

6 

.006 

523 

4 

2 

40 

1 

1 

10 

.005 

375 

1 

.004 

502 

0 

4 

10 

1 

100 

10 

.006 

538 

13 

JOOA 

711 

2 

2 

20 

1 

1 

100 

.002 

860 

0 

.003 

584 

1 

2 

20 

2 

100 

100 

.003 

024 

2 

j0O3 

583 

0 

4 

10 

1 

100 

100 

XNM 

868 

0 

4 

10 

2 

100 

100 

joarr 

025 

42 

TaUe  K.6  -  Load-Dependent 

NOLR=  1 
NOPC=  1 
MULT  =  4 
LENGTH  =  30000 
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/ 


ms^ 

mi 

tleri.. , 

Basic  ZFls  without  Primarv  ConiRR 

Basic  2PL  with  Primary  Codv 

jna 

COB 

M  nfi-irii  •• 

AmRnBTns 

....  ^  , 

2 

30 

2 

1 

1 

.000 

404 

4 

.008 

482 

1 

2 

30 

3 

1 

.012 

455 

15 

.010 

530 

0 

2 

30 

4 

1 

1 

.014 

499 

39 

.011 

581 

21 

2 

30 

5 

1 

1 

.014 

552 

30 

.013 

626 

36 

2 

30 

8 

1 

1 

X)10 

655 

107 

.015 

720 

102 

2 

30 

10 

1 

1 

.015 

677 

200 

.014 

744 

176 

2 

30 

15 

1 

1 

.015 

756 

303 

.013 

607 

356 

4 

4 

4 

1 

1 

0 

880 

220 

4 

8 

4 

1 

1 

.001 

1034 

226 

.002 

1001 

188 

4 

12 

4 

1 

1 

.003 

667 

105 

.005 

066 

150 

4 

16 

4 

1 

1 

.007 

650 

147 

.007 

910 

138 

2 

10 

2 

1 

1 

.007 

453 

17 

.006 

517 

14 

3 

10 

2 

1 

1 

.006 

585 

42 

.006 

682 

33 

4 

10 

2 

1 

1 

.005 

717 

64 

.005 

870 

80 

6 

10 

2 

1 

1 

.008 

1097 

102 

.003 

1155 

103 

Table  E.7  -  Load-Dependent  (FJk>th  2PL  Algorithms) 

NOLR=  1 
NOPC  =  1 
MULT  =  4 
LENGTH  =  30000 


Parameters 

Basic  2P 

L  without  Primarv  Copies 

Basic  2PL  with  Primary  Copy 

IDS 

JBGBflL 

m. 

THINir 

CWS 

j 

1 

jfr  of  raiiectiau-. 

Tlnxnii^iIllL 

iBreBe^p.l3nie. 

f  nf  lirnilnrlrw 

2 

30 

2 

1 

1 

.005 

628 

6 

.004 

883 

0 

2 

30 

3 

1 

1 

.007 

671 

13 

.005 

OAA 

vW> 

4 

2 

30 

4 

1 

1 

.007 

732 

23 

.006 

1021 

7 

2 

30 

5 

1 

1 

.007 

774 

39 

.007 

1057 

17 

2 

30 

6 

1 

1 

.007 

913 

80 

.007 

1210 

52 

2 

30 

10 

1 

1 

.007 

1005 

106 

.008 

1283 

72 

2 

30 

15 

1 

1 

.007 

1150 

179 

.007 

1407 

151 

4 

4 

4 

1 

1 

.000 

2515 

88 

.000 

2412 

70 

4 

8 

4 

1 

1 

.000 

2042 

85 

.000 

2515 

67 

4 

12 

4 

1 

1 

.001 

2313 

72 

.002 

2430 

52 

4 

16 

4 

1 

1 

.002 

1543 

66 

.002 

2064 

50 

2 

10 

2 

1 

1 

.003 

715 

12 

.003 

1007 

6 

3 

10 

2 

1 

1 

.002 

1210 

26 

.002 

1518 

14 

4 

10 

2 

1 

1 

.001 

1962 

20 

.002 

1807 

20 

6 

IQ  .. 

2 

1 

1 

.000 

2125 

46 

.001 

2900 

27 

Table  E.B  -  Load-Dependent  (Both  2F*L  Algorithms) 

NOLR=  1 
NOPC  =  1 
MULT  =  4 
LENGTH  =  30000 


-  122- 


Parameters 

Basic  T/ 

0  with  the  Thomas  Write  Rule 

Basic  2PL  with  Primarv  Codv 

ms 

NDGKA 

MH. 

TfflMt 

(XKS 

Ttiroivliait 

Are  Rbbd  Time 

#  of  reiectiorai 

Thran^mL 

Ave  Ben  Time 

#Qf  demJlocta, 

2 

30 

2 

1 

1 

.011 

332 

14 

.006 

482 

1 

2 

30 

3 

1 

1 

.014 

385 

28 

.010 

530 

0 

2 

30 

4 

1 

1 

.017 

432 

38 

.011 

581 

21 

2 

30 

5 

1 

1 

.019 

486 

65 

j013 

626 

36 

2 

30 

8 

1 

1 

.022 

627 

144 

j015 

720 

102 

2 

30 

10 

1 

1 

.024 

700 

221 

.014 

744 

176 

2 

30 

15 

1 

1 

.025 

014 

500 

.013 

807 

356 

2 

30 

20 

1 

1 

.024 

1124 

736 

4 

4 

4 

1 

1 

.011 

506 

1606 

.000 

880 

220 

4 

8 

4 

1 

1 

.016 

337 

020 

.002 

1001 

188 

4 

12 

4 

1 

1 

.010 

571 

574 

.005 

088 

159 

4 

16 

4 

1 

1 

.022 

572 

278 

.007 

010 

138 

4 

20 

4 

1 

1 

.022 

576 

205 

.009 

037 

110 

2 

10 

2 

1 

1 

.011 

320 

51 

.006 

517 

14 

3 

10 

2 

1 

1 

.014 

372 

108 

.006 

682 

33 

4 

10 

2 

1 

1 

.015 

426 

172 

J)05 

870 

60 

6 

10 

2 

1 

1 

.018 

484 

360 

.003 

1155 

103 

TabLe  E.9  ^  Load-Dependent  (extended  version) 

NOLR=  1 
NOPC=  1 
MULT  =  4 
LENGTH  =  30000 


Parameters 

Basic  T/ 

3  with  the  Thomas  Write  Rule 

Basic  2PL  with  Primarv  Copy 

NDS 

NDCB4 

HFL 

JEm. 

COBS 

AKBowllinD 

iSfiifimiBctiBna 

TTirauj^mul 

JnJieqiiTim 

i  oTdBfliBnda 

2 

30 

2 

1 

1 

.006 

582 

6 

.004 

883 

0 

2 

30 

3 

1 

1 

.009 

608 

15 

.005 

066 

4 

2 

30 

4 

1 

1 

.011 

664 

26 

.006 

1021 

7 

2 

30 

5 

1 

1 

.013 

683 

44 

.007 

1057 

17 

2 

30 

8 

1 

1 

.018 

786 

100 

.007 

1210 

52 

2 

30 

10 

1 

1 

.020 

862 

157 

.008 

1283 

72 

2 

30 

15 

1 

1 

ixa 

1025 

308 

.007 

1407 

151 

2 

30 

20 

1 

1 

.024 

1104 

557 

4 

4 

4 

1 

1 

.008 

873 

1141 

.000 

2412 

79 

4 

8 

4 

1 

1 

.012 

014 

341 

.000 

2515 

87 

4 

12 

4 

1 

1 

.013 

032 

200 

.002 

2430 

52 

4 

16 

4 

1 

1 

.014 

037 

140 

.002 

2064 

50 

4 

20 

4 

1 

1 

.015 

031 

107 

.004 

1887 

43 

2 

10 

2 

1 

1 

.006 

601 

19 

.003 

1007 

6 

3 

10 

2 

1 

1 

.007 

742 

34 

.002 

1516 

14 

4 

10 

2 

1 

1 

.008 

863 

55 

.002 

1807 

20 

6 

10 

2 

1 

1 

.000 

1000 

1  120 

.001 

2900 

27 

Table  E.  10  -  Load- Dependent  (extended  version) 

NOLR  =  1 
NOPC  =  1 
MULT  =  4 
LENGTH  =  30000 


-  123- 


/ 


Table  EL  1 1  -  Non-identicsd  Ste  Model  (Think  time  for  special  site  is  10) 

NOLR=  1 
NOPC  =  1 
MULT  =  4 
1£NGTH  =  30000 


Parameters 

Basic  T/ 

0  with  the  Thomas  Write  Rule 

Basic  2PL  with  Primary  Copy 

MBS 

XBSk 

■TL 

THDIK 

COBS 

jhroatiistaL 

Am  Be^pTime 

—  1  qfuBiwfiww 

JDOBIlfjbjpiL 

i 

1 

f  flftolflpdB- 

2 

10 

2 

1 

100 

.003 

567 

1 

.003 

1060 

2 

2 

15 

2 

1 

100 

.003 

582 

2 

.003 

064 

2 

2 

20 

2 

1 

100 

.003 

564 

2 

.003 

626 

3 

2 

25 

2 

1 

100 

.003 

555 

1 

.004 

006 

2 

2 

30 

2 

1 

100 

.005 

556 

1 

.004 

002 

1 

2 

10 

2 

1 

too 

.005 

567 

1 

.003 

1060 

2 

2 

10 

3 

1 

100 

.005 

506 

0 

.003 

1147 

20 

2 

10 

4 

1 

too 

.006 

602 

16 

.002 

1283 

35 

2 

10 

e 

1 

100 

.008 

652 

47 

.002 

1301 

64 

2 

10 

8 

1 

100 

.010 

662 

110 

.002 

1466 

83 

2 

10 

10 

1 

100 

.012 

706 

167 

.001 

1423 

126 

2 

10 

15 

1 

100 

.001 

1640 

181 

2 

10 

2 

1 

100 

.003 

567 

1 

.003 

1060 

2 

3 

10 

2 

1 

100 

.005 

716 

17 

.003 

1300 

10 

4 

10 

2 

1 

100 

.006 

832 

35 

.002 

1753 

17 

10 

2 

1 

100 

.008 

862 

87 

.001 

2854 

28 

Table  E.12  -  Non-identical  Ste  Model  (Think  time  for  special  site  is  10) 


NOLR=  1 
NOPC  =  1 
MULT  =  4 
LENGTH  =  30000 


’  !« 

Implementation  of  the  analytic  solution  technique  described  in  ChapLers  2  and  3,| 
.^)pendix  F  ten  in  C  and  run  on  a  PDP-1 1/45.  (Also  run  on  a  VAX-1 1/780.) 
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Vtrr.  1  ofi  1(0) 


Sample  Output  -  Analytic  Model 


ct<0.  Kl*?,  N=lCt  St:‘..COCOOO.  Kr  1 1  fu.i  I  :  C  .  7  ^  0  1  ?  1 ,  o  v  f  r  (,«•  jirtrO  .  0 1 0  0  0  0  S 1 1  ]  =  1 .  ^  C  0  CO  0  SC  2  3  =  1  .2  0  0  0  0  0  $15  3  =  1.00  000  0  St  *>  J  =  0  .  ■>  C  0  CO  C  Stt.3  =  C.H 
Ctiftlr.s  tLf.s  Wji=1.000000,  l  =  l.  OUOCCJO  ,  Wf  ».=  l  .  CO  OOCO  ,  !ol  =  0.00  100l),«vr:l.CS00U0 

Cutnt=7t.M  =  C,rC>^5«2,knl:0.CCij‘  5  .Wfs-rS.l^'tliSr.UIM  1  (  W(; )  -  -Q  .  0  00  7  5  0  ,  D  ;  F  f  2  (  km  I  >  =- C  .  0  0  00  0^  1 0  If  F  5  <  w  c  s  )  =  -  0  .0  C  01  05 


Cctsor.=  l 

.ocoocc. 

Summer. in=1.0fCO(iC.c>pg=0 

1*666  5(0) 

=0.9C1251 

Ctf  (C  )  =  C..CC’  tPf 

PPO'K  0.9) 

Z 

0  .0C56U9 

/cfA.-.d) 

=  t .  0  9  1  •  2  3 

666(1)  =6.611  6 71 

PPOP( 1,9) 

= 

0 .6  36  125 

r  c  F  A  r.  ( 2  > 

=C.0C1I17 

CCC(?)=0.111 372 

PKni( 2,9) 

0.1)5785 

rf.R*n<3) 

=  0.CCf.  10b 

6CC ( 3  >  =  6 .219002 

PROI*(  3.9) 

? 

0.216175 

fr.F  »*.(i) 

=  L.C  CC  002 

CCC ( 1 >=0.268921 

PFObd.S) 

i 

o.26!)  :h? 

»  CR»M5> 

CC.CCCCOO 

CCCC.  )  =  0.1616l  1 

PROP (5, 9) 

r 

0.208176 

F  CF-Al.C/,) 

cc.corooo 

CCC(6):U.&60*25 

PR0P(6,9> 

r 

0.  Ml  36  3 

PCRANI 77 

co.orccoo 

CCC( 77=0.021508 

PR0h( 7.9) 

= 

0 .0  31  212 

FCRAl.ie) 

=o.ococoo 

CCCl 8>=C.003 19C 

PROP (f, 9) 

r 

0.007661 

rr.RAS(9) 

=o.ccccoo 

OCC(9)=t.OCO2P0 

rppb( 9,9) 

0 .000682 

Frch  of  »jett1rg  blcckrc:  on  0  01  1  retursfs  Is:  0.h09ril  1‘U  <  0 1  =  2. 700C'9l 
rrtb  ol  trttlps  Olockcc  on  1  ot  1  rpcupsls  Is:  0.095719  x.' ( 1  )  =2.  7  0  9291 


PFOhOC  0 .1  )=f. $75000 
rhOIiHCl  *1  ISO. 025000 


PRCL*B(0*2)  =  fl. $50025 
KBOaUd  .2)  =  0.  019575 


PROc:fi{o,3>=o.9?fea5$ 
PROniM  J ,3)  =  0. 073111 


PRO(t»(tO,1>  =  C.905tf6 
.PhO(jB(l,1)  =  0. 096312 


i'ROPaCO  ,5>i:0.8«1096 
PHOhBd  ,5)  =  0. 118901 


rKCfBiO  i6>  =  C.fi5$06fl  PROM)  <  0 , 7  )  =  0  .  f  3  75  92  PRO  Cl:  (  0  ,8  )  =  0 . 8 1  6652  P  RCt'B  (  0 , 9 )  =  C.  7  962  3  6  PRO  EB  (  0 , 1 0  )  =0 . 7  7(  33  0 
PRCf'Pd  .61  =  0.110932  PhCPBd,  7  )  =  0. 162108  PRO  BD  d  .8  1  =  0 . 1 8  3  3 1 8  PR  OllB  d  ,  9 1  =  0 . 20  37  61  P  HO  bb  <1  . 1 0  >  =0 . 223  6  7  0 
itlTTH  IFRCUCFPUT  IS:  1.110328 
lliF  In  sys  Is:  1.C06317 
Lofbo.  Is:  2.709291 


Version  Id) 

C  =  5C.  PL  =  e»  N=10.  St  =  5.000l'C0,  Rc  si  cual  =0.7  U  35  I,  cvi'r  hrad  =  0  .013  00  0  $11  3  =  1.5  00  00  0  SC  2  3=1  .3  0000  0  SL  3  3  =  1 . 10  000  0  St  1  J=1 .  C  0  C  00  0  SttJ  =  0.9 
l^.tartlns  c-uess  k  ^  =  1 . 00  00  CO «  Uid  I  =  i  .  00  GO  00  «  kc  s  =  1  .  00  00  00  .  To  1=  0  .  Oo  )  0  0  0  .  a  v  r  =  1 . 0  0  0  0  C  0 

Ccun  1  =  7  .  k'.iiC  .2  51  SIC.  knil  =  0.  CUSiei  .kl  ts  =  t  .3  69  01  C.C  IFF  1  (  Wo  )  =  -0  .0  J0367.  bIFF2  (  knil  ):“0.  0000  16.0  1FF3  (Uts  >  =  -0.00  06  27 


Pc tsumxl  .C6CCC0.  ; 

Sumpyranxl . 

OCCOCO.eiipq=0 

FGPArjlC)  =0.9  U  715 

CCC( C)=C 

. 063  115 

PROtK  C,9) 

r 

C  .002751 

PC-RAVd  )=0.tf  0075 

CCCd  )=0 

.027971 

PPOH(  1,9) 

s 

0  .022912 

f  r-RAK(2>=C.C03l09 

06C(2)=C 

.103007 

PR09( 2, 9) 

= 

0 .081  80  1 

fChAN(3)=C.0Cl S70 

OCC(3>=0 

.2126 35 

PROlU  3,9) 

0.163085 

Frt>Rri(i)  =  o.rccooi 

OCC(1)=0 

.271  115 

PRCP(1,9) 

z 

0  .251  108 

rf.RAN(5)=C.0CCtCU 

CCC(5)=C 

21 162 

PKOB( 5,9) 

0.235121 

rCRAKd  1  =  6.00000  0 

rcc((.)  =  o 

.115857 

PROP  ((,.9) 

r 

0.1150U. 

F  C  RAb( 7)  =  0  .CCkPO  Q 

OtCl  7)  =  C 

.037108 

PROPl 7,9) 

= 

0.657911 

F'r-RAA(6>  =  C.croO0  0 

CCC<8  )  =  0 

.006  H  I 

PROb( 8,9) 

z 

0.013301 

F'CRAM9)rO.OCCCCO 

tCC<9>:C 

.000522 

P HOP (9,9) 

z 

0.001368 

fret  Of  gfttint  cccckrc  on  0  ot  l  rrcursts  Is:  0.913«91  lU (0  )  =  3. 02 7082 
Frob  ot  ceding  ulockvc  on  1  ol  1  recursts  Is:  0  .083255  bM  d  )  =  3.  027  C8  2 


PROf'C  .V  t.  /  ■....  980000 
pRsr-r  :  C2ocoo 


PKOI‘b(C.2)  =  0. 960100 
PKOt'OCl  .2  )  =  0.  03  9600 


PRObB<0.3>=0.91 1192 
PROhF.d  .3  >  =  3.0588  08 


FB0l'n<0.1>  =  C. 922368 
PROHbd,1)  =  0.0  /76  32 


PROPRIO .5)=0.9D3921 
PRODhCl .51=0.096079 


rFOIH<0.6)  =  0. 685812  PRC Eb ( 0 ,7 )  =  0. E 6f ) 26  PRO Pb < C  .8 >  =  0 . 65 0763  PROPP ( C  .  9)  =  C. 9 3371 6  PRO bP 1 0  . 10 ) =0 . « 1 7 0 73 
FROIbCl  •61  =  0.111158  PKObbd  .7)  =  0. 131871  PRO  bB  (  1 .8  )  =  0 . 1 1  92  37  PR  OBR  d  .  9  >  =  0 . 1 662  52  PR0BBd.l0)=U.182927- 
SYSTfP.  IhFCUtHPUT  IS:  1.036608 
Hire  In  sy«  *s:  1.626111 
latbda  Is:  5.027082 


Version  1(2) 


CrlOO.  ML  =  6.  6  =  10.  Sl  =  5.CCCC00.  R  e  s  Ic  wa  ( =  0  .  (  2  1 1  77  .  o  ve  r  F  p  a  0=  0 . 0 1  C  0  0  0  SC  1  :=2.  COCO  00  S  C2  3=  .1 . 8  0  00  0  0  SC  5  3=  1  .6  0  00  00  S  L 1  3=  1 . 5C  00  OC  S15J=1. 
Starting  Gwess  ky  =  1.00000C.  krt  =  l.CCrO0O.  kcs=i.OOOOCC.  IoL=0.0010C0.a<ir=1.0000C0 

Count  =8.  ki(  =  0  . 261829, WsdO.t'J  5:71, kts:a. 068  31  C.DIFFII  kb  )  =  J0.Q00ill. GIF  F2(  km  1)  =-0.0000  67. U  1FF5  (  .  c  s  )  = -0 .0  0  03  61 
Otcsd'r.  =  I.0CCC0P.  Sum(.'i,'ran=l.Cfroco,e)ipc=C 


rrfAN{l):C. 91190$ 
FriAr(;)=c.c537ii 
1 CR7P<2)=4.0C1357 
Pr.RAK(3)=C.0CCC20 
rtFSf.(1)=t.OCOCOO 
F&F7,N(5)=C.0CCGO0 
l’C.RAf.(6)=O.OCOOOO 

rcRAr.(7)  =  o.ocooou 
rcRAf;(P)  =  c.occcco 
fCRA1(9):C.CC0000 
Prtb  ol  grttlng  blcckvd  on 
Prtb  of  getting  Clockev  on 


CCC(C)=0.CCC16Q 

PFCriF  t,  9) 

z 

0.000138 

CCCd  7  =  0.00252  5 

PROH( 1,9) 

r 

0 .00208  7 

CCC ( 2 ) =0 .0) 7235 

PR01'(  2,9) 

r 

C .0 11 0/ 2 

OCC(3)=0.066  793 

PR0H( 3,9) 

z 

9 .055319 

OCC(1)  =  C.161';50 

FRPH( 1,9) 

z 

0.139952 

CCC(5)=0. 251625 

PROB(5,9) 

z 

0 .235915 

0CC(6)=C. 26021 1 

PR'OlU  6 ,9  ) 

z 

0 .?b5 120 

fcCC( 7)=C. 166361 

PF0B( 7,8) 

z 

0.191533 

CC6(b)=0. 060113 

F'RCB(8,9) 

= 

0.080716 

CCC(9)=0. 009196 

PR0b(9,9) 

= 

0.015118 

o  1 
ot 


re  cues  t  s  Is: 
reoues  ts  Ik: 


0  .911771 
0  .055091 


MU  <0  >  =  1.807198 
MW  (1  )=1.  807196 


FRCRKC.I  )=C.99COOO 
F  ROt'bd  li  )  =  C.  01  0000 


PRObH(0,2)=0. 'eClOO 
PROfbd  ,2)  =  0.  0199CO 


PROFFi(0  ,3  >=0.970299 
PRObfid  , 37  =  0.029701 


PROI‘P<0,1)  =  C.9i,0596 
PR3[IUd,1)  =  3.U39l01 


PRCPH (0  .5  >  =  0.950990 
PROFiSd  .57  =  0. 019010 


PROt b(0,6)=e.91 1160  PfiObIKO. 77=0. 932065  PRO C b { 0 , 6 ) = 0 . 92 2 715  P ROPP < 0 , 9 > = C. 9 1 35 1 7  PRO PP < 0 , 1 0 ) =0 .901 362 
rR0IRd|6)  =  C. 058520  PRObb  d  .  7  >  =  0.  06  7935  PRO  EB  <1  .6  >  =  0 . 07  7255  PROPB  d  .9)  =0. 0  861 83  PRO  bfi  d  ,  107  =0 . 09561  8 
TYSTCM  TI1RCUCHPUI  is:  0.711688 
71te  In  sys  Is:  8.128113 
iavbaa  It:  1.607198 


) 


Implementation  of  the  simulation  model  described  in  Chapter  3.  Written  in  C  and  run 
on  a  PDP-1 1/45.  > 
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Implementation  of  the  two-phase  locking  simulation  model  described  in  Chapter  5 
Written  in  GPSS  and  run  on  an  IBM  3033. 
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I 

ut'OAic  HFOUESr  jrHVict  1 1  mi  ;  »xh  i  o.  2/xxxxx» 

1 

COMMON  IC AT  ION  MCSSACC  DCLAY:  A XH I  1 , 2 / X X X X X * 

2 

PENALTY  FCH  RCSTAMTINO  TRANS:  A XM I  > , 2 / XX X XX A 
2  •  - 
OCAOLOCK  TIMtCOT  T I  ML  :  A X  hi  2 .2 /XX X XX  A 

5 

KE50CT5  _  . 


SINULATICN  RON  LENGTH:  AXl,2/XXXXXA 

3  _  _ 

NO  CF  CCMPLEt ICNS:  AX7,2/XXX>XA 
A 


THRCUGHPUT  is:  AXIA  ,2/3LXXX.XXXA  JOBS/ONIT  TIME 

3 

AVL  RlSP  time  is:  a  1  art  I  me  a 3/XXXXX . X  a  time  units 
3 

NO  OF  OEAOLCCKS  RCSCLVCO:  A XF 1 3 a  2/ X X XX X  A 

''NUMeER  "’C('  ■  BUI^S  AX  J<J9  a2/xxx  A 
3 

overall  AVE  XPUT:  AX39St2/3LXXX.XXXXA  JUUS/UMT  TIME 

'overall  ave'pesp  time;  AX’VA ,2/XXXXXA  time  uKuTS" 

3 

overall  AVE  DEADLCCXS  FCUNOI  A X39 3  a  2/ X X XXX A 


// 


Implementation  of  the  timestamp-ordered  simulation  model  for  described  in  Chapter 
5.  Written  in  GPSS  and  run  on  an  IBM  3033. 


//  JOO  CLASS«=C 
/•JCfiPAKM  l-( 

/•PHIOKIIY  A  __ 

//  tXtC  CPSSV 

//(.pss.ivs  IN  on  * 

PC  ALLCJCATC  F  AC. 0. srn. 30. CLf, O.LCjvi.  10  O.TAC.btl-UN,‘j,U:;V.O.U:»V,0 

_ Bt  AU.UCAJ£.CH  A.  30  0  .GHH.  0.  MAC  .lU.LMO.O.HMi.O.XAC.OOO.CCM.iJOOCO  . . 

tJNL  IbT 
SI MtLATE 

***********************************  ***^*  ***♦**♦**♦♦».**♦*♦♦♦* 

BASIC  3IHESTAMP  ORDEBINC  AlTF  THE  THOMAS  WRITE  PUEE  SIMOLATION  PBUOHAM 

author:  bboce  i«  caller _ _ _  _ _ _ 

date:  may  20,  ise2 


transactions  in  this  mooel  have  a  numuer  cf  parameters  that  depends 

ON  the  NUMOER  of  HEACS  (VSNOLR)  ANO  PRE-CCMMirS  (VirJUPC)  THAI  THE 
transactions  performs.  LISTED  EELOY  ARE  TFE  PARAMETERS  AND  THEIR 
INTERPREfATIONS  OR  CONTENTS: 


_ PARAMET£B_ 


3 

4 

5 

6 


7 

8 
9 

10 


11 

12 
13 

_LA_ 

IS 


16  ->  16+V4NOLH-1 
16+VSNOLR-l  ->  16+VSNCLRAVN0PC-2  PRECOMMIT  GRANULES  REQUESTED 


interpretation  or  CONTENTS 


TEMPORARY  STORAGE 

TRANSACT ICN  ID  _ 

NUA3EP  OF  PRECOMMlt^  PROCESSED 
PRCCESSGR  THREAD  TRANS  OCCUPIES 
ORIGINATION  SITE  NuMOER 
NUNdEP  OF  READS  PROCESSED 
TEMPORARY  STORAGE  (OFTEN  SITE  *1 
COUNT  OF  A  CF  SITES  TO  PRECOMMIT 
CURRENT  PRIMARY  COPY  SITE 
transaction  START  TIME 
TE'mPCRARY  STORAGE 
TEMPORARY  STORAGE 
READ  SERVICE  TIME 
PRECCMMIT  SERVICE  TI.ME  _ 

"update  service  TIME 

read  granules  requested 


STORAGES  ARE  USED  IN. THIS  MCDEL  TO  INDICATE  THE  LOAD  AT  EACH  SITE. 
THERE  IS  A  STORAGE  ASSOCIATED  WITH  EACH  SITE.  THERE  ISA  LOGIC 
SMITCh  ASSOCIATED  WITH  EACH  GRANULE  SC  THAT  REQUESTS  CAN  QUEUE  WHEN 
A  PRECOMMIT  HAS  BEEN  ACCEPTED  AT  THIS  SITE  BUT  THE  UPDATE  RECULST 
HAS  NCT  yet  arrived.  TABLES  ARE  KEPT  PURELY  FOR  STAIiSIICAL  RESULTS. 
USER  CHAINS  ARE  USED  TC  HANOLE  QUEUES  SO  T  H  AT  THE  NECE  SSARY  QUEUEING 


DISCIPLINE  CAN  BE  FCLLQWEO. 

.♦  1  •  »»♦♦♦»♦♦*»»»*»»♦»** ***_**********  ************^******.*  ********** 


*  RUNTIMEB 

GENERATE  30000 
UNLIST  AHS 


LENGTH  CF  SIMULATION 


**♦**♦♦♦♦♦♦*♦♦*♦•  4* *•♦*♦♦  +  ♦♦**♦***♦»*♦♦**. 

SAVE  ALL  MODEL  PARAMETERS  AND  ACCUMULATE  CUTPUT  A  T  | j^j  jj  _A  ^T  E  R_  E_^C  H  RUN 


tt******************** ******************************************  ********** 


SAV6VALUE _ |j  C  I  _ _ 

SAVEVALUE  T.NS^NT 
SAVEVALUF.  I4.V1XPUT 
SAVEVALUE  I,ViNOS..M 

_S  Ay_£  VALUE  2.VMN  0  GfiA  jJtL _ 

SAVEVALUE  3,VSMPL.H 
SAVEVALUE  A.VSNOLR.H 
SAVEVALUE  S.V4NOPC.H 
SAVFVALUE _ ::$j  V * TH  I NK  ,  H _ 

sa'vevalue  T.VSMEAN.H 
SAVEVALUE  8.VJLRS.H 
SAVEVALUE  9.V$PCS.H 

SAVE value  10. V *  U P  S . H _ 

SAVEVALUE  ll.VSCCMS.H 
SAVEVALUE  I5.V*PNTY.H 
SAVEVALUE  16.N»fiJECT 


SIMULATION  RUN  LENGTH-USED  FULLMQRp_PAR 
USED  FULLmCRD  PARAILTER 


***********************************************************  ********* 

SAVEVALUES  393-399  ARE  USED  TC  SAVE  VALUES  ACROSS  RUNS 

tt******************'**  **********  ***i^i*T****  **  *  *  *  4*  ♦**!>"♦■*"**  4  44'4mn(  4ir44>4r44~ 
SAVEVALUE  3g6*,X7 

_ SAVEVALUE _ 397*  tTESRTlME _ _ _ _ _ 

SAVEVALUE  3984. XI6 
SAVEVALUE  399*.  I 
SAVEVALUE  396.V#SVTHfl 

_ S  AyEYAJUPS _ ^35.4.1  V  *  5  V.RT - - - 

SAVfcVALUE  393.V4SVRJ 
TERMINATE  1 


4  GENERATE  TRANSAtt ICNS 
GEN  STARTMACRO 

GENERATE  VSME AN . FN IE XP . .VSMPL. .VSNOP.F 

ASSIGN  S.AA 

- it  vc  v'ATur'i^RL  N  crrvr,  f  'ti;k 

ASSIGN  4.VSPHMPL 

TRANSFER  .GRAN 

_ _ ENOMACRO  .  _  _  _ ...  . 

♦  "  COMMUNICATI OKS  DELAY  MACRO 

CMC  STARTMACRO 

ENTER  VSCOM.l 

_ ADVANCE _ _y»COMS  .FNILXP _ _ _ 

LEAVE  VSCCM.  I 

ENOMACRO 

*  CHOOSE  granules  MACRO 

_ R£QQ._.STARIMACK0 . .  . . .  .  ....... 

ASSIGN  1.0 

INDEX  VSLOCP.O 

SAVEVALUE  I.Pl.H 

_ SAVLVALUE _ 2iV$LOC,F,H_ _  _ 

SAVE  VALUE  2-.I.H 
#0  TEST  G  XH2 . VGCFFST  .  »A 

ASSIGN  1.0 

_ _ _  INDEX  _ XH2tO  ... 

SAVEVALUE  3.P  I  .H 
TEST  Nc  XHI,XH2.AC 
SAVEVALUE  2- . I .H 
TRANS!  FR  ,*R 

- TK'fyTfJnnjo - 


4  I  •  A  »«  I  't  H  •  # 

^  A  T »  r,  M  j  N  \  n- 

r  ^CMA<.h(> 

J  FCNC  I  i*  f.  ^;ni  #C^4 

^  ri  11  I  I  y  .»>  P  I  1  y  .1  I 

0 
0 


.C.O.O/Oyl.O.IT'/O.t'.a.ii.'/C.J.C. 

.  r  .  0  I  •y/  'l  .  /  .  1  .<>  /  0  .  /  .  1  .  3<1/0  .  '1  .  1  .  (./O  ,(.!<. 

.  ‘y  .  ?  .  J  /  O  .V.,’  ,  2  .  ^..VO.  ■>•»  ,2 .01  /0.<>5  .:•  . 

.  'JO  .  3  .'J/'C  .'y'l  .H  .-•y/O.  .  t  .  J/  C  .O'Jt  ,«1 . 2/0  . 


r XPPNlNT 1 Al  2  10  I H I GuTIOM 

3  .  C.  t','"  /O  .4  ,0  ')‘./C.!y  ,  0  .6<3 

I  .  (./O  ,0<..  1  .  0  J/C  .  £0. 2  .  t  2 

90 . 3 . 2/0 .9/ , a .3 


♦  / 
t 
♦ 

> 

T  10 

VAR  1  able 

Cl  A  100  APS 

1  NANS  AC  T  ICN  ID 

NPLNO 

> 

♦ 

♦ 

matrix 

Ilf  1  (  10 

CMONT  OF  NOMIII  H  tr  AC.riVC  TOANS 
r  H«)M  c  ACM  u  1 T  r 
j  or  »iu«s  •  1 

4  OF  COI.S  >•  V^NU'i 

TPANS 

9 

matrix 

X.  1 . 20 

ID  OF  ACTIVE  transactions 
*  CF  RO»S  =  1 

0  CF  COLS  >=  VtMFL*V»N0S  . 

« 

• 

READ 

matrix 

X.  1 .60 

READ  TIMESTAMPS 

0  OF  HOWS  a  1 

• 

•  OF  CCLS  >=  v»nOs*vanCGi^a 

WRITF. 

4 

MATRIX 

X.  I . ao 

XRtTC  timestamps 

0  OF  ROnS  ^  1 

4 

0  OF  CCLS  >=  V INOS*V*NOGRA 

PHMPL 

4 

VAR  I  able 

V»MPL* IPS- 

1  ) AMNSMPLNOI  1 .P5I 

4 

4 

OK.r 

JZ 

P  JCM 
R  J  T  t  M 
RT  IMP 

♦ 

CFFSI' 

NOS 

« 

_.-NCGR<S_ 

* 

NCLP 

NCPC 

_ NOP _ 

MPL 

• 

3T0RR- 


TAGLC 

TAMCF. _ 

TABLE 
T  40UC 
table 

Var  I  able  ' 

VARIABLE 

VARIABLE^. 

VARIABLE 

variable 

variable 

VAR  I  ABLE 


TABLE  CF  NONREJECIEO  TRASS  BY  THREAD 
TAIJL';  CF  PrjFCTfcO  IRANS  BY  THREAD 
CCUNT  Cr  »  OF  TIRES  TRANS  IS  REJECTEtS 
RESP  TIME  INCLOJING  REJECT 

TABLE  TO  KEEP  TRACK  OF  RESP  TIME 


PA  .  1 . 1  . 30 
_  P4 , I  ,  I  ,  30 
P16.0. I . JO 
r>  I  .0.30.20 
MPt0.0.3C.20 

_ 

2  NUMBER  OF  SITES 

IF  VSNDS  15  changed.  CHANGE  MATRICES 

_ la _  NUMBER  OF  GRANULES  PER  SITE  _ 

IF  VWNOGRA  I'S  changed.  CHANGE  MATRICES" 

;  NUMBER  OF  LOCAL  READS 

1  NUMBER  i:F  PRLCBMMITa 

_ ytt^LuRiV  TNCPC  ♦  VtUFF  ST  NUMBER  OF  PARAMETERS  PER  TRANS 

10  -  muL TIPROGBAMMING  level  AT  EACH  "Site 


V^fllABLE _ V*N0GRA4  (P7- I  )  ♦f’l  LOGICAL  GRANULE  NUMBER  _ 


V  ■  VAPiAOLES  USED  TC  CaLcCLAIE  C  UT  P'U  f ' 'S  I  *  T  I  S  T  TCS 

* 


XPUT 

svthr 
SVRT 
SV'IJ  . 

FVARI APLC 
f  VAPI XLLE 

fvari  arlf. 
r VAR  I  able 

LCCL 

variable 

1000* ( *7/C 1 ) 

IOJ0*(X3Sfc/(Cl»XJ99))4.0O05 
XJ97/X  399A .5 
XJ9B/AJ99T.S  


LCCP  VARtABLE 


POEV*UFFST  CCUNT  OF  READS  PROCESSED  ALREADY 

■v*nCli54P3*V*OFFSI  CCUNT  OF  PRECOMMITS  PROCESSED  "ALREACY" 


NdSLl  _VA»T.IASLE _ V»NP5:r.| _ 


MEAN  variable 


F  V  AR I ABL  E 

V  A!T  I  ABLE 

V  am  I  ABLE 
VARIABLE 

VAR  I  able 
VARIABLE 

VAR  I  able 

VAR  I ADLG_ 
WAR  I  ABLE 

variable 


I 


MEAN  INTERARRIVAL  TIME-EACH  SITE 


UNIFORM  OIST  l.( 
RCAO  RECUEST  SERVICE  TIME 

LOAD- dependency 

PRECCMMIT  SERVICE  TIME  _ 

LCAO-OEPENJENCY 
UPDATE  RECLEsT  SERVICE  TIME 
I OAO-CEPENDENCY 
COMMUNICATION  DELAY  _ 


CCV 

V AP 1  able 

VtNOSA I 

5  I  NO 

VAR  I  ABLE 

7 

LI  Sfy 

t;  VAR  I  ABLE 

XB*L*P2 

E0U7 

U  V  A  o  I  A  BL  E 

p;;  ce  •  X  7 

MKLT 

F.  VARIABLE 

P  3 ‘NF •  C  AP2 'L • X  7 

POt.  T 

E VAR  I  ABLE 

PJ  •  E  •  3  AP2.  fL  •  X7 

LTH7 

F  V  AR  I  AIJFE 

P-':*L*X7 

equal 

ev  am  I AEL  E 

xij  •  e  •  p  ? 

•  RTF 

E VAR  I  able 

PJ • NE  •  0 

PCF4 

eVAP I  ABLE 

P  J • L • 0  AP2 'L • XF 

V»NOGRA*RN2/1000FI 
SO 

P12»S*7*10 
100 

P I 4FS4  7* I  0 
60 

PI5FS*7»10 
1  THINK  TIME 

I  penalty  for  restarting  TRANS  (MUST  BE  >*II 


COMMUNICATION  DELAY  STORAGE  NO 


SITE  NUMBER 


i  *  •  «  •  I 


*•****»< 

* 

♦  PROCESSING  BEGINS  WITH  EACH  SITE  GENERATING  TMANSACIJONS 

* 

'  A  A  4  *  •  A  a'a'a  AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaAAAAAaAAAAAI 


CFN 

GLN 


M  ACRC 

_  MACRO.. 


I 

_ 2_. _ 


REQUEST  GRANULES 


OR  AN 


PCRAN 


LICPl 

urori 

B  K  A  Y  1 


ASSIGN 
ASS  !i.N 
MS  A  VuV  AL'JL 
T  I.  •.  I  I; 

A  S  S I GN 
AS-i  I  GN 
assign  _ 
A  S  1  I  GN 
A  jS  IGN 

I  .’.ni  X 
r  (.  j  I  E 
ASSIGN 
M  AC  1-0 

assign 

I  r.iu  A 
‘A  '  '•  I  .GV 


2,v»r  10 
10. ct 

IP ANS . I .P4 ,P2 
PI  T .0 . PGR  AN 
I  5 . V»LWS .E  XP 
1 4 . V »pc  s . r  xp 
r>  ,v  TUPS. EXP 
6  »  .  I 
I  .  0 

V  n.rcL  .c 

t  t . 0 .OKA Y I 

V  H.BCL  .  V 1N0G 

U'  AI  1  .  It  ST  I  .LNI.I'I 

I  .0 

V  H  .0 


ASSIGN  IhANSACHON  10 
assign  ri;»NSALI(UN  start  TIME 
NOIL  IMAN-.ACTION  ID 


lE.N  RFAi)  '.(.RVICF.  IIML 
SStGN  HMIL.IMMII  SLRVIOL 


As 

A 

ASSIGN  I  I'U 
I  N  L  R  L  M  t,  M 


,  r  L  H I.  uui- 

ClUM  ui 


(  i  ME 

r  :..-.rvicf 
RL.AUS 


T  |ME 


CMC03F  GRANULE 

TC t  IN  granule  already  chosen 


JS". iiTTi  'Tiir  Hi.;Miir<r  TO  rrr 


'  r. 


/  '• 


\ 


>  •  I 


RHtAU 


^  ^  J  »  .t  • 

TLOI  L 


Gate  Li^ 

_XBEAO  MSAVGVALUE 
ENTCn 
ADVAfJCE 
LEAVE 

_ TEST  OE  _ 

transfer 
S AVEVALUe 
UNLINK 

TPANSf.L8_ 

L  INK 
LINK 


MXt.bRl  I  L  (  1  .R  I  I  .P2  .R  JECT  RhAC  CAN  RWCC'.EU  »F 

T  I  ,'<E  •- I  AVI*  jt-EAUR  THAN 
tlvrSTAMR  OH  DATA 
REJECT 


■  RITE 
I  TEM. 


t  1  HLH  W  I  Si. 

IF  LOCK  SET  THEN  TEST  CiUl.Lt 
M2  UROA1G  HI  AO  rirtuSlAMM 

STORAGE  INLICATLS  LCAO  AT  ASSOCIATED 
HEAD  REQUEST  SERVICE 


TESTO 


LNKIT 

OREAD 


PI  ,TESTC 
RCAO. I »RI 
P5,  I 
VTOLRS 
P5,  1 

_E<>tViNCLRiRGRAN..  _ 

.PGRAN 

8.P2 

PI tLNKIT.I .BVTLESS. tCREAD 
iXREAO 

'PliP2  QUEUE  unlinked 

PI tP2  CUEUE  READ 


SI  TE 


TRANSACTION 


pREcawrr 


PGRAN 


ASSIGN 
ASSIGN 
I  NOEX 


TEST  E 
LCOP2  ASSIGN 
REDO  MACRO 

OKAY2  ASSIGN,, _ 

INDEX 

ASSIGN 

SPLIT 

NFXT  TEST  NE 


CMC 
NOCOM 


MACRO 
ASSIGN 
TEST  e 

"T~EST~6E 

msavevalue 

SPL  IT 


3Ht  I 

1.0 

JVSLC^.O  _ _ _ 

PI.(j.0KAT5~ 

V»LOCP.V»NOG  CHOOSE  GRANULE 

CKAY2.TEST2,LCCP2 

_I,.0  _  _ 

V*LOCP  .0 
7.0 

VSNOSL 1 .NEXT .7  DO  PRECOMM 

P7 .P5 .NOCCM IF  LOCAL  PRECCM 

ELSE  COMMUNICA 


INCREMENT  COUNT  OF  PHECOMMITS 


ITS  IN  PARALLEL 

MIT  THEN  NO  COMM  DELAY 

T ION  delay 


* 


I  . VSSTCRR 

MXSTRANS I  I  .P4 ) .P2 . ASS  SEE  IF  TRANSACTICN  HAS  ALREADY 

_  FEtN  RESTARTED 

MX*READ( I .PI ) .M2. ACCPI  CHEC 
ITEM  TO  SEE  IF  P 
TRANS. I. PA. 0  REJECT  T 

l.ASS  OCNT  MAKE  OTHER  PRECC 

ThTAhTNECESSARY  TC'KNCw  T 


K  READ  timestamp  OF'OATA 
RECOMMIT  CAN  BE  ACCEPTED 
H ANSACTI CN 

MMITS  (.Air  ANY  LONGER 
H  I  S' PRECOMM  FT  V*  S^ttETJECTEO’ 


* 

* 


RELEASE  LOCKS  PROCESS  I  KG _ 


ALL  READS  AND  LATER  WRITES  THAT  ARRIVED  AF 
UPOATE  REQUEST  MUST  CE  PROCESSED  AS  IF  THE 


TER  PRECCMMIT  OUT  OEFORt 
Y  JUST  arrived.  _ 


* 

« 

BACK3  ASSIGN 

_ ASSIGN _ 

•  INDEX 
ASSIGN 
S AVEVALUe 
UNL INK 


DNWRO 


PROS 


LOOP 
LCCP 
TRANSFER 
iE.ST_F _ 


7. VSNOS 

_ 

VSLOCP .0 
1  .VSSTCRR 

8. P2 

PI .PROS . 1 .evsE: 

^ZJTz - 


LCGICR 
transfer 

SKIP2  UNLINK 

transfer 


■RECKK'SAVEVALUE 

SPL  I  T 
L  INK 

ulink  unlink 

- TTTAN^nri?' 


7T3A< 

3  .BACK3 
.rject 

,CH*JLiJ  ..SKJF2- _ 

PI 

.XIT4 

PI .RELNK. 1 .BVSSRTE. .ARE AO 
.X  IT  1  I  


UAL.. DNWRO  UNLINK  CORRESPONDING  PRECCMMIT 


ARE  AO 


ASS 

ASSl 


UNLINK 
LCGICR 
JRANSFER, 
ASSIGN 
TEST  E 

assemble 

.J.E5  I_E _ 

TEST  GE 


"r.p?" 

1 .ULINK 
PI  .P2 

PI  .RRE AC . ALL .eVSROBA 
“.'XTT9 

PI .RREAC. ALL 
PI 

_  _ 

1  .PI 

MXSTRANS ( I  .P4 I .P2.X IT2 
VSNQS 

MXSTRANSI  I  .P4  )  fPitX.lTl 
P3.VSNCPC. PGRAN 


UPDATES 


SKIPl 

CMC 


♦ 

« 


ASSIGN  3.VSNOPC 

ASSIGN _ 7t0 _ . _ 

SPLIT  VSNOSU I . SX IP  iTT 

TEST  NE  PS.P/.LCCAL  NO  CCMMUNICAT 

macro  COMMUNICATION  DEL 

:iG.N _ _1.Q _  _ 

INDEX  VSLCCP.O 

ASSIGN  I. VSSTCRR 

SAVCVALUe  7.P2 

UNL  I  NK _ PI  .X  ITC  f_  L.  B  VS.E  C  U7  .  .  NX  UP _ 


ION  DEL 
AY  FOR 


AY  FOB 
UPOATE 


LOCAL  UPDATE 


TRANSACTION  WAS  STILL  THERE  SO  PROCESS  PRIOR  READS  AND  WHITES 


_JUNL1NK  pj  .RJECT  .ALL  .RVSRDLT 

UNLINK  PI  .XI  T7.ALL  ,BVI..RLT 

MSAVEVALUE  WRITE.  I  .PI  .P2  UPOATE  SR 

ENTER  P7.1  STCRAGE  INDICATES 

_ ADVANCE-  _ VSOUPS _  SERVICE  FOR  .  UPOATE  _RE. 

LEAVE  P7.1 

PROCESS  READS  THAT  COME  BEFORE  NEXT  WRITE 


ITfc  TIM 
LOAD  AT 
CVE3I _ 


ES  f AMP 

ASSOCIATED  SITE 


UNLINK 
□UFFER 

transfer  ,NXUP 
_ KELNI-.SAVtVALDt _ ltd _ 


PI.RELNI.I.OVSSRre..ARE| 


SML  I  T 

L  INK 

SKPl  UNLINK 

_ _ 1RAN5I  LM 

AHEI  unlink 
LCGICR 
TR ANGHER 

_.N*UH._.LCCP  _ J.LCCAC 

ASSHMHLE  vsnos 
tabulate 

MARK 

_ _ TABULA  IE 

MS AVI  V AL  UL 
CNT  TMANsriR 


I . SKPI 
PI  .P? 

PI  .RRE  A0.ALL.BVSR0LT..Xir3 

.XI  T6 

PI ,NHF AC.ALL 
PI 

,NXUP 


HJTl  M 

RTIME 
TRANS . 
.RESTA 


I  .P4 .0 


TRANSACT ILN* S  PRECCMMIT  SAS 
WAITS  TOR  ASSUCIAICU  UI'OAIi. 


ACCC  MTfO 
Rl.QUES  I 


,0  SCT  LUCK  while  MRECUMMIT 


ACCPT 


LGGtCS 
SMI  1  T 

"I  I  MIT 


M  I 

I  ,  5l  liVP 
"FI  ,07 . 


Sir  LUCK  Af.D  WAIT  rCM  UMOAl! 


c 


cr<c 


Al,  V  ANCt 
LI  Ave 
TL-.T  NG  ■' 

X  A(,(J() 
rw  AN'iF  EH 


»'  I  *  i 

V  VCf’CS 

M.’.  i 

'pr. p*),  As<; 

•  A  S  3 


SEHVlCC  t  0‘^  Fl'CCOMMir 


coxMLN I CA r I CN  delay  ran  phccummii  ack 


HEOTA  ADVANCE 
Ti'ANOrEW 
_ HJtiCI  AUVANCf,_ 


VHH  1  NK  ,FN  IE  AP  THINK  I  I  ML  / 
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